Detection of identity by descent using next-generation whole genome sequencing data

BackgroundIdentity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the software programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, including identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data.ResultsHere, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate.ConclusionWe further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.

[1]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010 .

[2]  Alkes L. Price,et al.  Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals , 2011, PLoS genetics.

[3]  Anders Albrechtsen,et al.  A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics. , 2011, Genome research.

[4]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[5]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[6]  Jurg Ott,et al.  Combining identity by descent and association in genetic case-control studies , 2008, BMC Genetics.

[7]  B. Browning,et al.  A fast, powerful method for detecting identity by descent. , 2011, American journal of human genetics.

[8]  Robert B. Hartlage,et al.  This PDF file includes: Materials and Methods , 2009 .

[9]  D. Charlesworth,et al.  The genetics of inbreeding depression , 2009, Nature Reviews Genetics.

[10]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[11]  L. Almasy,et al.  Multipoint quantitative-trait linkage analysis in general pedigrees. , 1998, American journal of human genetics.

[12]  K Allen-Brady,et al.  Shared Genomic Segment Analysis. Mapping Disease Predisposition Genes in Extended Pedigrees Using SNP Genotype Assays , 2008, Annals of human genetics.

[13]  Harry Campbell,et al.  Genomic Runs of Homozygosity Record Population History and Consanguinity , 2010, PloS one.

[14]  John Novembre,et al.  Global distribution of genomic diversity underscores rich complex history of continental human populations. , 2009, Genome research.

[15]  Christopher Meek,et al.  Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with application to gene mapping , 2010, Bioinform..

[16]  E A Thompson,et al.  A model for the length of tracts of identity by descent in finite random mating populations. , 2003, Theoretical population biology.

[17]  Anders Albrechtsen,et al.  Natural Selection and the Distribution of Identity-by-Descent in the Human Genome , 2010, Genetics.

[18]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010, Nature.

[19]  Amanda B. Hepler,et al.  Genetic relatedness analysis: modern data and new challenges , 2006, Nature Reviews Genetics.

[20]  Anders Albrechtsen,et al.  Relatedness mapping and tracts of relatedness for genome‐wide data in the presence of linkage disequilibrium , 2009, Genetic epidemiology.

[21]  Brian L. Browning,et al.  High-resolution detection of identity by descent in unrelated individuals. , 2010, American journal of human genetics.