Rapid, Phase-free Detection of Long Identical by Descent Segments Enables Effective Relationship Classification.

Identical by descent (IBD) segments are a useful tool for applications ranging from demographic inference to relationship classification, but most detection methods rely on phasing information and therefore require substantial computation time. As genetic datasets grow, methods for inferring IBD segments that scale well will be critical. We developed IBIS, an IBD detector that locates long regions of allele sharing between unphased individuals, and benchmarked it with Refined IBD, GERMLINE, and TRUFFLE on 3,000 simulated individuals. Phasing these with Beagle 5 takes 4.3 CPU days, followed by either Refined IBD or GERMLINE segment detection in 2.9 or 1.1 h, respectively. By comparison, IBIS finishes in 6.8 min or 7.8 min with IBD2 functionality enabled: speedups of 805-946× including phasing time. TRUFFLE takes 2.6 h, corresponding to IBIS speedups of 20.2-23.3×. IBIS is also accurate, inferring ≥7 cM IBD segments at quality comparable to Refined IBD and GERMLINE. With these segments, IBIS classifies first through third degree relatives in real Mexican American samples at rates meeting or exceeding other methods tested and identifies fourth through sixth degree pairs at rates within 0.0%-2.0% of the top method. While allele frequency-based approaches that do not detect segments can infer relationship degrees faster than IBIS, the fastest are biased in admixed samples, with KING inferring 30.8% fewer fifth degree Mexican American relatives correctly compared with IBIS. Finally, we ran IBIS on chromosome 2 of the UK Biobank dataset and estimate its runtime on the autosomes to be 3.3 days parallelized across 128 cores.

[1]  Anders Albrechtsen,et al.  RelateAdmix: a software tool for estimating relatedness between admixed individuals , 2014, Bioinform..

[2]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[3]  John Blangero,et al.  Genome-wide linkage analyses of type 2 diabetes in Mexican Americans: the San Antonio Family Diabetes/Gallbladder Study. , 2005, Diabetes.

[4]  John Blangero,et al.  Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives , 2017, Genetics.

[5]  Itsik Pe'er,et al.  Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples , 2012, PloS one.

[6]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[7]  Shankaracharya,et al.  Relationship Estimation from Whole-Genome Sequence Data , 2014, PLoS genetics.

[8]  Shaojie Zhang,et al.  RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts , 2019, Genome Biology.

[9]  Jinchuan Xing,et al.  Maximum-likelihood estimation of recent shared ancestry (ERSA). , 2011, Genome research.

[10]  José Luis Ambite,et al.  Rapid detection of identity-by-descent tracts for mega-scale datasets , 2019, Nature Communications.

[11]  Adam Auton,et al.  Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales , 2017, Nature Communications.

[12]  Shane A. McCarthy,et al.  Reference-based phasing using the Haplotype Reference Consortium panel , 2016, Nature Genetics.

[13]  Amy L. Williams,et al.  Inferring identical by descent sharing of sample ancestors promotes high resolution relative detection , 2018, bioRxiv.

[14]  J. Blangero,et al.  Genetic and environmental contributions to cardiovascular risk factors in Mexican Americans. The San Antonio Family Heart Study. , 1996, Circulation.

[15]  Peter L. Ralph,et al.  The Geography of Recent Genetic Ancestry across Europe , 2012, PLoS biology.

[16]  F. Stahl,et al.  Crossover interference in humans. , 2003, American journal of human genetics.

[17]  Jay Shendure,et al.  Estimating human mutation rate using autozygosity in a founder population , 2012, Nature Genetics.

[18]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[19]  Bruce S Weir,et al.  Model-free Estimation of Recent Genetic Relatedness. , 2016, American journal of human genetics.

[20]  Ross E. Curtis,et al.  Clustering of 770,000 genomes reveals post-colonial population structure of North America , 2017, Nature Communications.

[21]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[22]  Giulio Genovese,et al.  Non-crossover gene conversions show strong GC bias and unexpected clustering in humans , 2015, eLife.

[23]  Ying Zhou,et al.  A Fast and Simple Method for Detecting Identity by Descent Segments in Large-Scale Data. , 2020, American journal of human genetics.

[24]  Simon C. Potter,et al.  Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis , 2011, Nature.

[25]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[26]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[27]  Brian L Browning,et al.  Detecting identity by descent and estimating genotype error rates in sequence data. , 2013, American journal of human genetics.

[28]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[29]  Lei Sun,et al.  Fast and accurate shared segment detection and relatedness estimation in un-phased genetic data using TRUFFLE , 2018, bioRxiv.

[30]  Scott M. Williams,et al.  The Great Migration and African-American Genomic Diversity , 2015, bioRxiv.

[31]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[32]  B. Browning,et al.  Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data , 2013, Genetics.

[33]  P. O'Connell,et al.  Linkage of type 2 diabetes mellitus and of age at onset to a genetic location on chromosome 10q in Mexican Americans. , 1999, American journal of human genetics.

[34]  Hua Tang,et al.  Estimating kinship in admixed populations. , 2012, American journal of human genetics.

[35]  John Wakeley,et al.  Leveraging distant relatedness to quantify human mutation and gene conversion rates , 2015, bioRxiv.

[36]  Janina M. Jeff,et al.  Genetic identification of a common collagen disease in Puerto Ricans via identity-by-descent mapping in a health system , 2017, bioRxiv.

[37]  Amy L. Williams,et al.  Distinguishing pedigree relationships using multi-way identical by descent sharing and sex-specific genetic maps , 2019, bioRxiv.

[38]  Douglas W Bjelland,et al.  A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data , 2016, European Journal of Human Genetics.

[39]  Yaniv Erlich,et al.  Identity inference of genomic data using long-range familial searches , 2018, Science.

[40]  Adam Auton,et al.  Escape from crossover interference increases with maternal age , 2014, Nature Communications.

[41]  Alexander E. Lopez,et al.  Profiling and leveraging relatedness in a precision medicine cohort of 92,455 exomes , 2017, bioRxiv.

[42]  Amanda B. Hepler,et al.  Genetic relatedness analysis: modern data and new challenges , 2006, Nature Reviews Genetics.

[43]  Daniel N Seidman,et al.  Crossover interference and sex-specific genetic maps shape identical by descent sharing in close relatives , 2019, bioRxiv.

[44]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.