Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood

Approximate Bayesian computation (ABC) techniques permit inferences in complex demographic models, but are computationally inefficient. A Markov chain Monte Carlo (MCMC) approach has been proposed (Marjoram et al. 2003), but it suffers from computational problems and poor mixing. We propose several methodological developments to overcome the shortcomings of this MCMC approach and hence realize substantial computational advances over standard ABC. The principal idea is to relax the tolerance within MCMC to permit good mixing, but retain a good approximation to the posterior by a combination of subsampling the output and regression adjustment. We also propose to use a partial least-squares (PLS) transformation to choose informative statistics. The accuracy of our approach is examined in the case of the divergence of two populations with and without migration. In that case, our ABC–MCMC approach needs considerably lower computation time to reach the same accuracy than conventional ABC. We then apply our method to a more complex case with the estimation of divergence times and migration rates between three African populations.

[1]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[2]  M W Feldman,et al.  Genetic absolute dating based on microsatellites and the origin of modern humans. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Michel Tenenhaus,et al.  Régression PLS et applications , 1995 .

[4]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[5]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[6]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[7]  J. Garza,et al.  Detection of reduction in population size using data from microsatellite loci , 2001, Molecular ecology.

[8]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[9]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[10]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[11]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  M. Feldman,et al.  Features of evolution and expansion of modern humans, inferred from genomewide microsatellite markers. , 2003, American journal of human genetics.

[13]  Gordon Luikart,et al.  Comparative Evaluation of a New Effective Population Size Estimator Based on Approximate Bayesian Computation , 2004, Genetics.

[14]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[15]  Jean-Marie Cornuet,et al.  GENETIC ANALYSIS OF COMPLEX DEMOGRAPHIC SCENARIOS: SPATIALLY EXPANDING POPULATIONS OF THE CANE TOAD, BUFO MARINUS , 2004, Evolution; international journal of organic evolution.

[16]  Gabor T. Marth,et al.  The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations , 2004, Genetics.

[17]  Laurent Excoffier,et al.  SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history , 2004, Bioinform..

[18]  Kevin R. Thornton,et al.  Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. , 2005, Genome research.

[19]  Nicolas Ray,et al.  Bayesian Estimation of Recent Migration Rates After a Spatial Expansion , 2005, Genetics.

[20]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[21]  Giovanni Destro-Bisol,et al.  Contrasting patterns of Y chromosome and mtDNA variation in Africa: evidence for sex-biased demographic processes , 2005, European Journal of Human Genetics.

[22]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[23]  Sohini Ramachandran,et al.  Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Jean-Marie Cornuet,et al.  Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers , 2005, Genetics.

[25]  Ryan D. Hernandez,et al.  Natural selection on protein-coding genes in the human genome , 2005, Nature.

[26]  Ryan D. Hernandez,et al.  Simultaneous inference of selection and population growth from patterns of variation in the human genome , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[28]  R. Nielsen Molecular signatures of natural selection. , 2005, Annual review of genetics.

[29]  Shameek Biswas,et al.  Genomic insights into positive selection. , 2006, Trends in genetics : TIG.

[30]  Donald B. Rubin,et al.  Validation of Software for Bayesian Models Using Posterior Quantiles , 2006 .

[31]  Adrian W. Briggs,et al.  Analysis of one million base pairs of Neanderthal DNA , 2006, Nature.

[32]  Vincent Plagnol,et al.  Possible Ancestral Structure in Human Populations , 2006, PLoS genetics.

[33]  James I Mullins,et al.  EVOLUTION OF INTRAHOST HIV-1 GENETIC DIVERSITY DURING CHRONIC INFECTION , 2006, Evolution; international journal of organic evolution.

[34]  H A Lessios,et al.  TEST FOR SIMULTANEOUS DIVERGENCE USING APPROXIMATE BAYESIAN COMPUTATION , 2006, Evolution; international journal of organic evolution.

[35]  E. Hadly,et al.  Bayesian Estimation of the Timing and Severity of a Population Bottleneck from Ancient DNA , 2006, PLoS genetics.

[36]  S. Tavaré,et al.  Modern computational approaches for analysing molecular genetic variation data , 2006, Nature Reviews Genetics.

[37]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[38]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[39]  Carsten Wiuf,et al.  Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum , 2007, PLoS Comput. Biol..

[40]  S. Coles,et al.  Inference for Stereological Extremes , 2007 .

[41]  Jean-Marie Cornuet,et al.  Bread, beer and wine: Saccharomyces cerevisiae diversity reflects human history , 2007, Molecular ecology.

[42]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[43]  Michael J. Hickerson,et al.  A MULTILOCUS PERSPECTIVE ON COLONIZATION ACCOMPANIED BY SELECTION AND GENE FLOW , 2007, Evolution; international journal of organic evolution.

[44]  Liu Xianming,et al.  A Time Petri Net Extended with Price Information , 2007 .

[45]  L. Excoffier,et al.  Statistical evaluation of alternative models of human evolution , 2007, Proceedings of the National Academy of Sciences.

[46]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[47]  Laurent Excoffier,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005, Evolutionary bioinformatics online.

[48]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[49]  M. Przeworski,et al.  A new approach to estimate parameters of speciation models with application to apes. , 2007, Genome research.

[50]  Vladimir V. Apanasovich,et al.  Fluorescence Lifetime Imaging Microscopy (FLIM) data analysis with TIMP , 2007 .

[51]  J. Hey,et al.  Evolution of population structure in a highly social top predator, the killer whale. , 2007, Molecular biology and evolution.

[52]  D. Liberles,et al.  The quest for natural selection in the age of comparative genomics , 2007, Heredity.

[53]  A. Clark,et al.  Recent and ongoing selection in the human genome , 2007, Nature Reviews Genetics.

[54]  R. Huey,et al.  Introduction history of Drosophila subobscura in the New World: a microsatellite‐based survey using ABC methods , 2007, Molecular ecology.

[55]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[56]  Jean-Marie Cornuet,et al.  Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation , 2008, Bioinform..

[57]  K. Kidd,et al.  Maternal traces of deep common ancestry and asymmetric gene flow between Pygmy hunter–gatherers and Bantu-speaking farmers , 2008, Proceedings of the National Academy of Sciences.

[58]  J. Marin,et al.  Adaptivity for ABC algorithms: the ABC-PMC scheme , 2008 .

[59]  Nicolas Ray,et al.  Colonization history of the Swiss Rhine basin by the bullhead (Cottus gobio): inference under a Bayesian spatially explicit framework , 2008, Molecular ecology.

[60]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[61]  Mary K Kuhner,et al.  Coalescent genealogy samplers: windows into population history. , 2009, Trends in ecology & evolution.

[62]  Jean-Marie Hombert,et al.  Origins and Genetic Diversity of Pygmy Hunter-Gatherers from Western Central Africa , 2009, Current Biology.