Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives

Relatedness inference is an essential component of many genetic analyses and popular in consumer genetic testing. Ramstetter et al. evaluate twelve..... Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92–99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for >76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance.

[1]  J R O'Connell,et al.  PedCheck: a program for identification of genotype incompatibilities in linkage analysis. , 1998, American journal of human genetics.

[2]  Bruce S Weir,et al.  Model-free Estimation of Recent Genetic Relatedness. , 2016, American journal of human genetics.

[3]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[4]  Jonathan Pevsner,et al.  Inference of Relationships in Population Data Using Identity-by-Descent and Identity-by-State , 2011, PLoS genetics.

[5]  Serafim Batzoglou,et al.  Reconstruction of genealogical relationships with applications to Phase III of HapMap , 2011, Bioinform..

[6]  W. G. Hill,et al.  Identification of Pedigree Relationship from Genome Sharing , 2013, G3: Genes, Genomes, Genetics.

[7]  Lei Sun,et al.  PREST-plus identifies pedigree errors and cryptic relatedness in the GAW18 sample using genome-wide SNP data , 2014, BMC Proceedings.

[8]  Hua Tang,et al.  Estimating kinship in admixed populations. , 2012, American journal of human genetics.

[9]  Shankaracharya,et al.  Relationship Estimation from Whole-Genome Sequence Data , 2014, PLoS genetics.

[10]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[11]  K. Bussell Signalling: Friendly rivalry , 2005, Nature Reviews Molecular Cell Biology.

[12]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[13]  Ole Schulz-Trieglaff,et al.  AKT: Ancestry and Kinship Toolkit , 2016, bioRxiv.

[14]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[15]  Cory Y. McLean,et al.  Reducing Pervasive False-Positive Identical-by-Descent Segments Detected by Large-Scale Pedigree Analysis , 2013, Molecular biology and evolution.

[16]  G. McCracken,et al.  ON ESTIMATING RELATEDNESS USING GENETIC MARKERS , 1985, Evolution; international journal of organic evolution.

[17]  B. Weir,et al.  A Maximum-Likelihood Method for the Estimation of Pairwise Relatedness in Structured Populations , 2007, Genetics.

[18]  R. Nielsen,et al.  Composite likelihood method for inferring local pedigrees , 2017, bioRxiv.

[19]  B. Browning,et al.  A fast, powerful method for detecting identity by descent. , 2011, American journal of human genetics.

[20]  Manfred Kayser,et al.  Improving human forensics through advances in genetics, genomics and molecular biology , 2011, Nature Reviews Genetics.

[21]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[22]  E. Thompson Identity by Descent: Variation in Meiosis, Across Genomes, and in Populations , 2013, Genetics.

[23]  Lei Sun,et al.  Detecting Pedigree Relationship Errors. , 2017, Methods in molecular biology.

[24]  B. Browning,et al.  Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data , 2013, Genetics.

[25]  M. Rasmuson Variation in genetic identity within kinships , 1993, Heredity.

[26]  Heng Li,et al.  Mapping the human reference genome's missing sequence by three-way admixture in Latino genomes. , 2013, American journal of human genetics.

[27]  Anders Albrechtsen,et al.  Natural Selection and the Distribution of Identity-by-Descent in the Human Genome , 2010, Genetics.

[28]  Eurie L. Hong,et al.  AncestryDNA Matching White Paper , 2022 .

[29]  J. Blangero,et al.  Genetic and environmental contributions to cardiovascular risk factors in Mexican Americans. The San Antonio Family Heart Study. , 1996, Circulation.

[30]  Itsik Pe'er,et al.  Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples , 2012, PloS one.

[31]  Timothy A Thornton,et al.  Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness , 2015, Genetic epidemiology.

[32]  P. Gill,et al.  Encoded evidence: DNA in forensic analysis , 2004, Nature Reviews Genetics.

[33]  S. O’Brien,et al.  SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations , 2015, GigaScience.

[34]  D. Balding,et al.  Relatedness in the post-genomic era: is it still useful? , 2014, Nature Reviews Genetics.

[35]  Brian L. Browning,et al.  Erratum to: Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort , 2013, Human Genetics.

[36]  Jeffrey Staples,et al.  PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent. , 2014, American journal of human genetics.

[37]  P. Visscher Whole genome approaches to quantitative genetics , 2009, Genetica.

[38]  Clara Diaz,et al.  Identifying large sets of unrelated individuals and unrelated markers , 2014, Source Code for Biology and Medicine.

[39]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[40]  Joshua M. Akey,et al.  Methods and models for unravelling human evolutionary history , 2015, Nature Reviews Genetics.

[41]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[42]  Thore Egeland,et al.  A parametric approach to kinship hypothesis testing using identity-by-descent parameters , 2015, Statistical applications in genetics and molecular biology.

[43]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[44]  B. Milligan,et al.  Maximum-likelihood estimation of relatedness. , 2003, Genetics.

[45]  M P Epstein,et al.  Improved inference of relationship for pairs of individuals. , 2000, American journal of human genetics.

[46]  Po-Ru Loh,et al.  Fast and accurate long-range phasing in a UK Biobank cohort , 2015, Nature Genetics.

[47]  E A Thompson,et al.  The estimation of pairwise relationships , 1975, Annals of human genetics.

[48]  Sewall Wright,et al.  Coefficients of Inbreeding and Relationship , 1922, The American Naturalist.

[49]  Elizabeth A. Thompson,et al.  Joint Inference of Identity by Descent Along Multiple Chromosomes from Population Samples , 2014, J. Comput. Biol..

[50]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[51]  Anders Albrechtsen,et al.  RelateAdmix: a software tool for estimating relatedness between admixed individuals , 2014, Bioinform..

[52]  Jinchuan Xing,et al.  Maximum-likelihood estimation of recent shared ancestry (ERSA). , 2011, Genome research.

[53]  J. Roach,et al.  Accurate and Robust Prediction of Genetic Relationship from Whole-Genome Sequences , 2014, PloS one.

[54]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[55]  L. Hurst Genetics and the understanding of selection , 2009, Nature Reviews Genetics.

[56]  Inference of kinship using spatial distributions of SNPs for genome-wide association studies , 2016, BMC Genomics.

[57]  Matthew D. Shirley,et al.  Unexpected Relationships and Inbreeding in HapMap Phase III Populations , 2012, PloS one.

[58]  D. Nickerson,et al.  PADRE: Pedigree-Aware Distant-Relationship Estimation. , 2016, American journal of human genetics.

[59]  D. Reich,et al.  Non-crossover gene conversions show strong GC bias and unexpected clustering in humans , 2014, bioRxiv.

[60]  Anna Shcherbina,et al.  KinLinks: Software Toolkit for kinship analysis and pedigree generation from HTS datasets , 2016, 2016 IEEE Symposium on Technologies for Homeland Security (HST).

[61]  Amanda B. Hepler,et al.  Genetic relatedness analysis: modern data and new challenges , 2006, Nature Reviews Genetics.

[62]  M. Spence,et al.  Analysis of human genetic linkage , 1986 .

[63]  B S Weir,et al.  Variation in actual relationship as a consequence of Mendelian sampling and linkage. , 2011, Genetics research.

[64]  José Luís Oliveira,et al.  TrigNER: automatically optimized biomedical event trigger recognition on scientific documents , 2014, Source Code for Biology and Medicine.

[65]  John Blangero,et al.  Genome-wide linkage analyses of type 2 diabetes in Mexican Americans: the San Antonio Family Diabetes/Gallbladder Study. , 2005, Diabetes.

[66]  P. O'Connell,et al.  Linkage of type 2 diabetes mellitus and of age at onset to a genetic location on chromosome 10q in Mexican Americans. , 1999, American journal of human genetics.

[67]  Brian L Browning,et al.  Detecting identity by descent and estimating genotype error rates in sequence data. , 2013, American journal of human genetics.

[68]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.