Estimating genealogies from linked marker data: a Bayesian approach

BackgroundAnswers to several fundamental questions in statistical genetics would ideally require knowledge of the ancestral pedigree and of the gene flow therein. A few examples of such questions are haplotype estimation, relatedness and relationship estimation, gene mapping by combining pedigree and linkage disequilibrium information, and estimation of population structure.ResultsWe present a probabilistic method for genealogy reconstruction. Starting with a group of genotyped individuals from some population isolate, we explore the state space of their possible ancestral histories under our Bayesian model by using Markov chain Monte Carlo (MCMC) sampling techniques. The main contribution of our work is the development of sampling algorithms in the resulting vast state space with highly dependent variables. The main drawback is the computational complexity that limits the time horizon within which explicit reconstructions can be carried out in practice.ConclusionThe estimates for IBD (identity-by-descent) and haplotype distributions are tested in several settings using simulated data. The results appear to be promising for a further development of the method.

[1]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[2]  M. Sillanpää,et al.  Bayesian oligogenic analysis of quantitative and qualitative traits in general pedigrees , 2001, Genetic epidemiology.

[3]  D E Weeks,et al.  Multipoint Estimation of Identity-by-Descent Probabilities at Arbitrary Positions among Marker Loci on General Pedigrees , 2001, Human Heredity.

[4]  Jeanette C Papp,et al.  Detection and integration of genotyping errors in statistical genetics. , 2002, American journal of human genetics.

[5]  J. Felsenstein,et al.  Sampling among haplotype resolutions in a coalescent‐based genealogy sampler , 2000, Genetic Epidemiology.

[6]  Xiao-Lin Wu,et al.  Estimating allelic number and identity in state of QTLs in interconnected families. , 2003, Genetical research.

[7]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[8]  M. Lynch,et al.  Estimation of relatedness by DNA fingerprinting. , 1988, Molecular biology and evolution.

[9]  M. Sillanpää,et al.  Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. , 1998, Genetics.

[10]  E. Thompson,et al.  Genetic mapping of disease genes , 1997 .

[11]  S. Heath Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. , 1997, American journal of human genetics.

[12]  Matti Pirinen,et al.  Finding Consistent Gene Transmission Patterns on Large and Complex Pedigrees , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  N. Schork,et al.  Gene mapping via the ancestral recombination graph. , 2002, Theoretical population biology.

[14]  F. Rousset,et al.  Inbreeding and relatedness coefficients: what do they measure? , 2002, Heredity.

[15]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[16]  J. Besag,et al.  Bayesian Computation and Stochastic Systems , 1995 .

[17]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[18]  Statistics in molecular biology and genetics : selected proceedings of a 1997 joint AMS-IMS-SIAM Summer conference on statistics in molecular biology , 1999 .

[19]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[20]  I Hoeschele,et al.  A note on algorithms for genotype and allele elimination in complex pedigrees with incomplete genotype data. , 2000, Genetics.

[21]  Y. Mao,et al.  A Monte Carlo algorithm for computing the IBD matrices using incomplete marker information , 2005, Heredity.

[22]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[23]  B. Guldbrandtsen,et al.  Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis. , 2003, Genetics.

[24]  Jules Hernández-Sánchez,et al.  Prediction of IBD based on population history for fine gene mapping , 2006, Genetics Selection Evolution.

[25]  Miguel Pérez-Enciso,et al.  Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. , 2003, Genetics.

[26]  D J Balding,et al.  Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. , 2002, American journal of human genetics.

[27]  Sebastian Zöllner,et al.  Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci , 2005, Genetics.

[28]  D E Weeks,et al.  Similarity of DNA fingerprints due to chance and relatedness. , 1993, Human heredity.

[29]  M. Goddard,et al.  Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data , 2004, Genetics Selection Evolution.

[30]  Matti Pirinen,et al.  Estimating genealogies from unlinked marker data: a Bayesian approach. , 2007, Theoretical population biology.

[31]  M. Lynch,et al.  Estimation of pairwise relatedness with molecular markers. , 1999, Genetics.

[32]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[33]  G. T. te Meerman,et al.  Genomic sharing surrounding alleles identical by descent: Effects of genetic drift and population growth , 1997, Genetic epidemiology.

[34]  Michael S. Blouin,et al.  DNA-based methods for pedigree reconstruction and kinship analysis in natural populations , 2003 .

[35]  G. Meerman,et al.  Association and haplotype sharing due to identity by descent, with an application to genetic mapping , 1997 .

[36]  G. McVean,et al.  Estimating recombination rates from population-genetic data , 2003, Nature Reviews Genetics.

[37]  Ina Hoeschele,et al.  Conditional Probability Methods for Haplotyping in Pedigrees , 2004, Genetics.

[38]  S. C. Heath,et al.  Markov Chain Monte Carlo Methods for Radiation Hybrid Mapping , 1997, J. Comput. Biol..

[39]  C. J-F,et al.  THE COALESCENT , 1980 .

[40]  N. Yi,et al.  Bayesian mapping of quantitative trait loci under the identity-by-descent-based variance component model. , 2000, Genetics.

[41]  Kerrie Mengersen,et al.  [Bayesian Computation and Stochastic Systems]: Rejoinder , 1995 .

[42]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[43]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .

[44]  E. Thompson,et al.  Estimation of conditional multilocus gene identity among relatives , 1999 .

[45]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[46]  M. Sillanpää,et al.  Bayesian analysis of genetic differentiation between populations. , 2003, Genetics.

[47]  Eric S. Lander,et al.  Faster Multipoint Linkage Analysis Using Fourier Transforms , 1998, J. Comput. Biol..

[48]  Robert G Cowell,et al.  A clustering algorithm using DNA marker information for sub-pedigree reconstruction. , 2003, Journal of forensic sciences.

[49]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[50]  J. Chang-Claude,et al.  Haplotype Sharing Analysis Using Mantel Statistics , 2005, Human Heredity.

[51]  Mikko J Sillanpää,et al.  Backward simulation of ancestors of sampled individuals. , 2005, Theoretical population biology.

[52]  Jinliang Wang,et al.  An estimator for pairwise relatedness using molecular markers. , 2002, Genetics.

[53]  K. Lange,et al.  Powerful Allele Sharing Statistics for Nonparametric Linkage Analysis , 2004, Human Heredity.