Statistical Applications in Genetics and Molecular Biology Deviance Information Criteria for Model Selection in Approximate Bayesian Computation

Approximate Bayesian computation (ABC) is a class of algorithmic methods in Bayesian inference using statistical summaries and computer simulations. ABC has become popular in evolutionary genetics and in other branches of biology. However, model selection under ABC algorithms has been a subject of intense debate during the recent years. Here, we propose novel approaches to model selection based on posterior predictive distributions and approximations of the deviance. We argue that this framework can settle some contradictions between the computation of model probabilities and posterior predictive checks using ABC posterior distributions. A simulation study and an analysis of a resequencing data set of human DNA show that the deviance criteria lead to sensible results in a number of model choice problems of interest to population geneticists.

[1]  Richard G. Everitt,et al.  Likelihood-free estimation of model evidence , 2011 .

[2]  Guillaume Laval,et al.  Formulating a Historical and Demographic Model of Recent Human Evolution Based on Resequencing Data from Noncoding Regions , 2010, PloS one.

[3]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[4]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[5]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jean-Marie Cornuet,et al.  Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation , 2008, Bioinform..

[7]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[8]  C. Robert,et al.  ABC likelihood-free methods for model choice in Gibbs random fields , 2008, 0807.2767.

[9]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[10]  L. Excoffier,et al.  Statistical evaluation of alternative models of human evolution , 2007, Proceedings of the National Academy of Sciences.

[11]  B. D. Ripley,et al.  SELECTING AMONGST LARGE CLASSES OF MODELS , 2004 .

[12]  Christian P Robert,et al.  Lack of confidence in approximate Bayesian computation model choice , 2011, Proceedings of the National Academy of Sciences.

[13]  Kevin R. Thornton,et al.  Automating approximate Bayesian computation by local linear regression , 2009, BMC Genetics.

[14]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[15]  Olivier François,et al.  Invalid arguments against ABC: Reply to A.R. Templeton , 2010 .

[16]  S. Tavaré,et al.  Modern computational approaches for analysing molecular genetic variation data , 2006, Nature Reviews Genetics.

[17]  Daniel Wegmann,et al.  Bayesian Computation and Model Selection Without Likelihoods , 2010, Genetics.

[18]  S. Coles,et al.  Inference for Stereological Extremes , 2007 .

[19]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[20]  Kevin R. Thornton,et al.  Approximate Bayesian Inference Reveals Evidence for a Recent, Severe Bottleneck in a Netherlands Population of Drosophila melanogaster , 2006, Genetics.

[21]  Jean-Marie Cornuet,et al.  Lack of confidence in ABC model choice , 2011, 1102.4432.

[22]  Alan R Templeton,et al.  Statistical hypothesis testing in intraspecific phylogeography: nested clade phylogeographical analysis vs. approximate Bayesian computation , 2009, Molecular ecology.

[23]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[24]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[25]  Laurent Excoffier,et al.  SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history , 2004, Bioinform..

[26]  Christophe Andrieu,et al.  Model criticism based on likelihood-free inference, with an application to protein network evolution , 2009, Proceedings of the National Academy of Sciences.

[27]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[28]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[29]  M. Beaumont,et al.  Likelihood-Free Inference of Population Structure and Local Adaptation in a Bayesian Hierarchical Model , 2010, Genetics.

[30]  L. Joseph 4. Bayesian data analysis (2nd edn). Andrew Gelman, John B. Carlin, Hal S. Stern and Donald B. Rubin (eds), Chapman & Hall/CRC, Boca Raton, 2003. No. of pages: xxv + 668. Price: $59.95. ISBN 1‐58488‐388‐X , 2004 .

[31]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[32]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[33]  Jody Hey,et al.  The study of structured populations — new hope for a difficult and divided science , 2003, Nature Reviews Genetics.

[34]  M. Slatkin,et al.  Estimation of levels of gene flow from DNA sequence data. , 1992, Genetics.

[35]  B. Carstens,et al.  An information‐theoretical approach to phylogeography , 2009, Molecular ecology.

[36]  H. Akaike A new look at the statistical model identification , 1974 .

[37]  L. M. M.-T. Theory of Probability , 1929, Nature.

[38]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[39]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[40]  Noah A. Rosenberg,et al.  Demographic History of European Populations of Arabidopsis thaliana , 2008, PLoS genetics.

[41]  M. Beaumont Approximate Bayesian Computation in Evolution and Ecology , 2010 .

[42]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[43]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[44]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[45]  Mark A. Beaumont,et al.  Joint determination of topology, divergence time, and immigration in population trees , 2008 .

[46]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[47]  Jean-Marie Hombert,et al.  Inferring the Demographic History of African Farmers and Pygmy Hunter–Gatherers Using a Multilocus Resequencing Data Set , 2009, PLoS genetics.

[48]  Michael P. H. Stumpf,et al.  Simulation-based model selection for dynamical systems in systems and population biology , 2009, Bioinform..

[49]  Jukka Corander,et al.  In defence of model‐based inference in phylogeography , 2010, Molecular ecology.

[50]  S T Sherry,et al.  Genetic traces of ancient demography. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[51]  S. Tavaré,et al.  Ancestral Inference in Population Genetics , 1994 .

[52]  M. Beaumont,et al.  ABC: a useful Bayesian tool for the analysis of population data. , 2010, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[53]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.