Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data

Segments of indentity-by-descent (IBD) detected from high-density genetic data are useful for many applications, including long-range phase determination, phasing family data, imputation, IBD mapping, and heritability analysis in founder populations. We present Refined IBD, a new method for IBD segment detection. Refined IBD achieves both computational efficiency and highly accurate IBD segment reporting by searching for IBD in two steps. The first step (identification) uses the GERMLINE algorithm to find shared haplotypes exceeding a length threshold. The second step (refinement) evaluates candidate segments with a probabilistic approach to assess the evidence for IBD. Like GERMLINE, Refined IBD allows for IBD reporting on a haplotype level, which facilitates determination of multi-individual IBD and allows for haplotype-based downstream analyses. To investigate the properties of Refined IBD, we simulate SNP data from a model with recent superexponential population growth that is designed to match United Kingdom data. The simulation results show that Refined IBD achieves a better power/accuracy profile than fastIBD or GERMLINE. We find that a single run of Refined IBD achieves greater power than 10 runs of fastIBD. We also apply Refined IBD to SNP data for samples from the United Kingdom and from Northern Finland and describe the IBD sharing in these data sets. Refined IBD is powerful, highly accurate, and easy to use and is implemented in Beagle version 4.

[1]  Brian L Browning,et al.  Identity by descent between distant relatives: detection and applications. , 2012, Annual review of genetics.

[2]  I. Pe’er,et al.  Length distributions of identity by descent reveal fine-scale demographic history. , 2012, American journal of human genetics.

[3]  B. Browning,et al.  Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort , 2012, Human Genetics.

[4]  H. Ostrer,et al.  North African Jewish and non-Jewish populations form distinctive, orthogonal clusters , 2012, Proceedings of the National Academy of Sciences.

[5]  Peter L. Ralph,et al.  The Geography of Recent Genetic Ancestry across Europe , 2012, PLoS biology.

[6]  Mark Abney,et al.  Using identity by descent estimation with dense genotype data to detect positive selection , 2012, European Journal of Human Genetics.

[7]  Ole A. Andreassen,et al.  A mutation in APP protects against Alzheimer’s disease and age-related cognitive decline , 2012, Nature.

[8]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[9]  A. Clark,et al.  Recent Explosive Human Population Growth Has Resulted in an Excess of Rare Genetic Variants , 2012, Science.

[10]  Sharon R. Browning,et al.  Detecting Rare Variant Associations by Identity-by-Descent Mapping in Case-Control Studies , 2012, Genetics.

[11]  M. D. Brown,et al.  Inferring Coancestry in Population Samples in the Presence of Linkage Disequilibrium , 2012, Genetics.

[12]  Alexander Gusev,et al.  The architecture of long-range haplotypes shared within and across populations. , 2012, Molecular biology and evolution.

[13]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[14]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[15]  Mark Abney,et al.  Identity by descent estimation with dense genome‐wide genotype data , 2011, Genetic epidemiology.

[16]  Alexander Gusev,et al.  DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. , 2011, American journal of human genetics.

[17]  Alun Thomas,et al.  Identification of regions of positive selection using Shared Genomic Segment analysis , 2011, European Journal of Human Genetics.

[18]  R. Nielsen,et al.  A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics. , 2011, Genome research.

[19]  B. Browning,et al.  A fast, powerful method for detecting identity by descent. , 2011, American journal of human genetics.

[20]  Alkes L. Price,et al.  Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals , 2011, PLoS genetics.

[21]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[22]  Taylor J. Maxwell,et al.  Deep resequencing reveals excess rare recent variants consistent with explosive population growth , 2010, Nature communications.

[23]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[24]  Brian L. Browning,et al.  High-resolution detection of identity by descent in unrelated individuals. , 2010, American journal of human genetics.

[25]  Simon C. Potter,et al.  Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region , 2009, Nature Genetics.

[26]  Anders Albrechtsen,et al.  Relatedness mapping and tracts of relatedness for genome‐wide data in the presence of linkage disequilibrium , 2009, Genetic epidemiology.

[27]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[28]  C. Hoggart,et al.  Genome-wide association analysis of metabolic traits in a birth cohort from a founder population , 2008, Nature Genetics.

[29]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[30]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[31]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[32]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[33]  B. Browning,et al.  Efficient multilocus association testing for whole genome association studies using localized haplotype clustering , 2007, Genetic epidemiology.

[34]  Peter M Visscher,et al.  Recent human effective population size estimated from linkage disequilibrium. , 2007, Genome research.

[35]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[36]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[37]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[38]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[39]  Sequence analysis Advance Access publication June 7, 2011 The variant call format and VCFtools , 2010 .

[40]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[41]  Weihua Chang,et al.  Whole-genome genotyping with the single-base extension assay , 2005, Nature Methods.

[42]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[43]  L. Excoffier,et al.  Bioinformatics Applications Note Genetics and Population Analysis Fastsimcoal: a Continuous-time Coalescent Simulator of Genomic Diversity under Arbitrarily Complex Evolutionary Scenarios , 2022 .