Installation
From the command line, extract germline with tar xzvf germline-X-X-X.tar.gz
, enter the extracted directory, and compile germline with make all
. A simple test case can be run using make test
.
Run & Input
GERMLINE can be executed directly from the command-line by running germline -input [ped file] [map file] -output [output prefix]
. GERMLINE accepts input in the plink ped+map format.
We strongly recommend phasing your data using EAGLE or SHAPEIT prior to running GERMLINE. A script for converting from the impute format is provided (bin/impute_to_ped
). A legacy pipeline for phasing with BEAGLE is also provided.
NOTE: Although the PLINK format is not natively intended for haplotypes, GERMLINE expects the respective alleles to appear in order; i.e. the first allele always corresponds to one haplotype and the second allele to the other. PLINK arbitrarily re-orders the alleles in processing the files, so we do not recommend handling phased data with PLINK prior to GERMLINE analysis because the haplotypes may not be intact.
Output
Upon completion, GERMLINE generates a .match and .log file in the specified location. Each line in the .match file corresponds to a pairwise shared segment, with the following fields:
- Family ID 1
- Individual ID 1
- Family ID 2
- Individual ID 2
- Chromosome
- Segment start (bp)
- Segment end (bp)
- Segment start (SNP)
- Segment end (SNP)
- Total SNPs in segment
- Genetic length of segment
- Units for genetic length (cM or MB)
- Mismatching SNPs in segment
- 1 if Individual 1 is homozygous in match; 0 otherwise
- 1 if Individual 2 is homozygous in match; 0 otherwise
Options
The program has several command line options to direct the segmental sharing process:
Flag | Description |
---|---|
-map |
File location for genetic distance map. Uses the PLINK map format |
-min_m |
Minimum length for match to be used for imputation (in cM or MB). Default: 3 |
-err_hom |
The maximum number of mismatching homozygous markers for a slice to still be considered part of a match. Default: 1 |
-err_het |
The maximum number of mismatching heterozygous markers for a slice to still be considered part of a match. Default: 0 |
-h_extend |
Extends from exact seeds using haplotypes rather than genotypes: Default |
-g_extend |
Extends from exact seeds using homozygotic sites only; this is only recommended when input data is very noisy. |
-homoz |
Allow self matches (runs of homozygosity) |
-homoz-only |
Analyze and report only auto/homo-zygous segments, no IBD reported but significantly faster analysis. |
-haploid |
Treat each input individual as two distinct and separate haplotypes. Output IDs will have .0/.1 suffix corresponding to each haplotype. The -err_het flag will have no effect in this analysis. |
-bin_out |
Generate output matches in binary format, creates a *.bmatch *.bsid and *.bmid files. These files can be converted to flat output using the parse_bmatch utility included and compiled in the package. |
-bits |
Size of each slice (in markers) used for exact matching seeds. Default: 128 |
-w_extend |
Extend the match beyond the slice end to the first mismatching marker. |