germline

Installation

From the command line, extract germline with tar xzvf germline-X-X-X.tar.gz, enter the extracted directory, and compile germline with make all. A simple test case can be run using make test.

Run & Input

GERMLINE can be executed directly from the command-line by running germline -input [ped file] [map file] -output [output prefix]. GERMLINE accepts input in the plink ped+map format.

We strongly recommend phasing your data using EAGLE or SHAPEIT prior to running GERMLINE. A script for converting from the impute format is provided (bin/impute_to_ped). A legacy pipeline for phasing with BEAGLE is also provided.

NOTE: Although the PLINK format is not natively intended for haplotypes, GERMLINE expects the respective alleles to appear in order; i.e. the first allele always corresponds to one haplotype and the second allele to the other. PLINK arbitrarily re-orders the alleles in processing the files, so we do not recommend handling phased data with PLINK prior to GERMLINE analysis because the haplotypes may not be intact.

Output

Upon completion, GERMLINE generates a .match and .log file in the specified location. Each line in the .match file corresponds to a pairwise shared segment, with the following fields:

Options

The program has several command line options to direct the segmental sharing process:

Flag Description
-map File location for genetic distance map. Uses the PLINK map format
-min_m Minimum length for match to be used for imputation (in cM or MB). Default: 3
-err_hom The maximum number of mismatching homozygous markers for a slice to still be considered part of a match. Default: 1
-err_het The maximum number of mismatching heterozygous markers for a slice to still be considered part of a match. Default: 0
-h_extend Extends from exact seeds using haplotypes rather than genotypes: Default
-g_extend Extends from exact seeds using homozygotic sites only; this is only recommended when input data is very noisy.
-homoz Allow self matches (runs of homozygosity)
-homoz-only Analyze and report only auto/homo-zygous segments, no IBD reported but significantly faster analysis.
-haploid Treat each input individual as two distinct and separate haplotypes. Output IDs will have .0/.1 suffix corresponding to each haplotype. The -err_het flag will have no effect in this analysis.
-bin_out Generate output matches in binary format, creates a *.bmatch *.bsid and *.bmid files. These files can be converted to flat output using the parse_bmatch utility included and compiled in the package.
-bits Size of each slice (in markers) used for exact matching seeds. Default: 128
-w_extend Extend the match beyond the slice end to the first mismatching marker.