Joint Inference of Identity by Descent Along Multiple Chromosomes from Population Samples

There has been much interest in detecting genomic identity by descent (IBD) segments from modern dense genetic marker data and in using them to identify human disease susceptibility loci. Here we present a novel Bayesian framework using Markov chain Monte Carlo (MCMC) realizations to jointly infer IBD states among multiple individuals not known to be related, together with the allelic typing error rate and the IBD process parameters. The data are phased single nucleotide polymorphism (SNP) haplotypes. We model changes in latent IBD state along homologous chromosomes by a continuous time Markov model having the Ewens sampling formula as its stationary distribution. We show by simulation that this model for the IBD process fits quite well with the coalescent predictions. Using simulation data sets of 40 haplotypes over regions of 1 and 10 million base pairs (Mbp), we show that the jointly estimated IBD states are very close to the true values, although the presence of linkage disequilibrium decreases the accuracy. We also present comparisons with the ibd_haplo program, which estimates IBD among sets of four haplotypes. Our new IBD detection method focuses on the scale between genome-wide methods using simple IBD models and complex coalescent-based methods that are limited to short genome segments. At the scale of a few Mbp, our approach offers potentially more power for fine-scale IBD association mapping.

[1]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[2]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[4]  Anders Albrechtsen,et al.  Natural Selection and the Distribution of Identity-by-Descent in the Human Genome , 2010, Genetics.

[5]  S. Wright,et al.  Systems of Mating. I. the Biometric Relations between Parent and Offspring. , 1921, Genetics.

[6]  Anders Albrechtsen,et al.  Relatedness mapping and tracts of relatedness for genome‐wide data in the presence of linkage disequilibrium , 2009, Genetic epidemiology.

[7]  Eleazar Eskin,et al.  Postassociation cleaning using linkage disequilibrium information , 2011, Genetic epidemiology.

[8]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[9]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[10]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[11]  Paul Marjoram,et al.  Fast "coalescent" simulation , 2006, BMC Genetics.

[12]  Jonathan Pevsner,et al.  Inference of Relationships in Population Data Using Identity-by-Descent and Identity-by-State , 2011, PLoS genetics.

[13]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[14]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[15]  E. T. Bell Generalized Stirling Transforms of Sequences , 1939 .

[16]  B. Browning,et al.  A fast, powerful method for detecting identity by descent. , 2011, American journal of human genetics.

[17]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[18]  M. D. Brown,et al.  Inferring Coancestry in Population Samples in the Presence of Linkage Disequilibrium , 2012, Genetics.

[19]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[20]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[21]  Sharon R. Browning,et al.  Detecting Rare Variant Associations by Identity-by-Descent Mapping in Case-Control Studies , 2012, Genetics.

[22]  Anders Albrechtsen,et al.  A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics. , 2011, Genome research.

[23]  E A Thompson,et al.  The IBD process along four chromosomes. , 2008, Theoretical population biology.

[24]  E. Bell,et al.  Postulational Bases for the Umbral Calculus , 1940 .

[25]  C. W. Cotterman,et al.  A calculus for statistico-genetics , 1940 .

[26]  Mark Abney,et al.  Identity by descent estimation with dense genome‐wide genotype data , 2011, Genetic epidemiology.

[27]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .

[28]  A. Templeton Systems of Mating , 2006, Population Genetics and Microevolutionary Theory.

[29]  Bernard Prum,et al.  Estimation of the inbreeding coefficient through use of genomic data. , 2003, American journal of human genetics.

[30]  Sewall Wright,et al.  Coefficients of Inbreeding and Relationship , 1922, The American Naturalist.

[31]  Brian L. Browning,et al.  High-resolution detection of identity by descent in unrelated individuals. , 2010, American journal of human genetics.

[32]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[33]  Sharon R Browning,et al.  Estimation of Pairwise Identity by Descent From Dense Genetic Marker Data in a Population Sample of Haplotypes , 2008, Genetics.

[34]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[35]  P. Stam,et al.  The distribution of the fraction of the genome identical by descent in finite random mating populations , 1980 .

[36]  Chris Glazner,et al.  Improving Pedigree-based Linkage Analysis by Estimating Coancestry Among Families , 2012, Statistical applications in genetics and molecular biology.

[37]  Sharon R Browning,et al.  Multilocus association mapping using variable-length Markov chains. , 2006, American journal of human genetics.