Likelihood-free estimation of model evidence

Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of "likelihood-free" methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to the estimation of the parameters of a given model, but can also be used to compare models. Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from population genetics: the comparison of different growth models based upon observations of human Y chromosome data from the terminal generation.

[1]  G. Peters Topics in Sequential Monte Carlo Samplers , 2005 .

[2]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[3]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[4]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[5]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[6]  Michael P. H. Stumpf,et al.  Simulation-based model selection for dynamical systems in systems and population biology , 2009, Bioinform..

[7]  H. Harpending,et al.  Population growth makes waves in the distribution of pairwise genetic differences. , 1992, Molecular biology and evolution.

[8]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[9]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[10]  Andrew R. Francis,et al.  The epidemiological fitness cost of drug resistance in Mycobacterium tuberculosis , 2009, Proceedings of the National Academy of Sciences.

[11]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[12]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[13]  Paul Fearnhead,et al.  Semi-automatic Approximate Bayesian Computation , 2010 .

[14]  Eric Minch,et al.  Genetic evidence for a higher female migration rate in humans , 1998, Nature Genetics.

[15]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .

[16]  M. Eisen,et al.  Probability and its applications , 1975 .

[17]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[19]  C. Robert,et al.  ABC likelihood-free methods for model choice in Gibbs random fields , 2008, 0807.2767.

[20]  M. Hammer,et al.  Genetic evidence for unequal effective population sizes of human females and males. , 2004, Molecular biology and evolution.

[21]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[22]  Nando de Freitas,et al.  Toward Practical N2 Monte Carlo: the Marginal Particle Filter , 2005, UAI.

[23]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[24]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[25]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[26]  Motoo Kimura,et al.  A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population*. , 1973, Genetical research.

[27]  W. Li,et al.  Estimating the age of the common ancestor of a sample of DNA sequences. , 1997, Molecular biology and evolution.

[28]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[29]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[30]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[31]  M W Feldman,et al.  Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[33]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[34]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[35]  C. J-F,et al.  THE COALESCENT , 1980 .

[36]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[37]  Gareth W. Peters,et al.  On sequential Monte Carlo, partial rejection control and approximate Bayesian computation , 2008, Statistics and Computing.

[38]  Petros Dellaportas,et al.  On Bayesian model and variable selection using MCMC , 2002, Stat. Comput..

[39]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[40]  Christian P. Robert,et al.  Model choice versus model criticism , 2009, Proceedings of the National Academy of Sciences.

[41]  P. Green,et al.  Trans-dimensional Markov chain Monte Carlo , 2000 .

[42]  Francesc Calafell,et al.  Population Genetics of Y-Chromosome Short Tandem Repeats in Humans , 1997, Journal of Molecular Evolution.

[43]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[44]  A. von Haeseler,et al.  Inference of population history using a likelihood approach. , 1998, Genetics.

[45]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[46]  F. Tajima The effect of change in population size on DNA polymorphism. , 1989, Genetics.

[47]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[48]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[49]  L. M. M.-T. Theory of Probability , 1929, Nature.

[50]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[51]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[52]  Kevin R. Thornton,et al.  Approximate Bayesian Inference Reveals Evidence for a Recent, Severe Bottleneck in a Netherlands Population of Drosophila melanogaster , 2006, Genetics.

[53]  Christophe Andrieu,et al.  Model criticism based on likelihood-free inference, with an application to protein network evolution , 2009, Proceedings of the National Academy of Sciences.