Likelihood-Free Inference in High-Dimensional Models

Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low-dimensional models for which the value of the likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics approximately sufficient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one-dimensional combination of statistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality, using toy models as well as by jointly inferring the effective population size, the distribution of fitness effects (DFE) of segregating mutations, and selection coefficients for each locus from data of a recent experiment on the evolution of drug resistance in influenza.

[1]  Paul Joyce,et al.  Testing the Extreme Value Domain of Attraction for Distributions of Beneficial Fitness Effects , 2007, Genetics.

[2]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[3]  Jessica L. Crisci,et al.  Recent progress in polymorphism-based population genetic inference. , 2012, The Journal of heredity.

[4]  Nicolas Chopin,et al.  Expectation Propagation for Likelihood-Free Inference , 2011, 1107.5959.

[5]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[6]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[7]  M. Beaumont,et al.  Likelihood-Free Inference of Population Structure and Local Adaptation in a Bayesian Hierarchical Model , 2010, Genetics.

[8]  D. J. Nott,et al.  Approximate Bayesian Computation and Bayes’ Linear Analysis: Toward High-Dimensional ABC , 2011, 1112.4755.

[9]  Liu Xianming,et al.  A Time Petri Net Extended with Price Information , 2007 .

[10]  Jonathan P. Bollback,et al.  Estimation of 2Nes From Temporal Allele Frequency Data , 2008, Genetics.

[11]  Franck Jabot,et al.  Inferring the parameters of the neutral theory of biodiversity using phylogenetic information and implications for tropical forests. , 2009, Ecology letters.

[12]  Philipp W. Messer,et al.  SLiM: Simulating Evolution with Selection and Linkage , 2013, Genetics.

[13]  Daniel Wegmann,et al.  Bayesian Computation and Model Selection Without Likelihoods , 2010, Genetics.

[14]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[15]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[16]  T. Lenormand,et al.  The Distribution of Beneficial and Fixed Mutation Fitness Effects Close to an Optimum , 2008, Genetics.

[17]  Carsten Wiuf,et al.  Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum , 2007, PLoS Comput. Biol..

[18]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[20]  Nicolas Ray,et al.  SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination , 2010, Bioinform..

[21]  A. Estoup,et al.  The global spread of Harmonia axyridis (Coleoptera: Coccinellidae): distribution, dispersal and routes of invasion , 2011, BioControl.

[22]  Paul Fearnhead,et al.  Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation , 2012 .

[23]  J. Møller Discussion on the paper by Feranhead and Prangle , 2012 .

[24]  Jean-Marie Cornuet,et al.  Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation , 2008, Bioinform..

[25]  Daniel Wegmann,et al.  Inferring the geographic mode of speciation by contrasting autosomal and sex-linked genetic diversity. , 2013, Molecular biology and evolution.

[26]  Claudia Bank,et al.  Thinking too positive? Revisiting current methods of population-genetic selection inference , 2014, bioRxiv.

[27]  M. Bilodeau,et al.  Theory of multivariate statistics , 1999 .

[28]  Kevin R. Thornton,et al.  An Approximate Bayesian Estimator Suggests Strong, Recurrent Selective Sweeps in Drosophila , 2008, PLoS genetics.

[29]  A. Futschik,et al.  A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation , 2012, Genetics.

[30]  A. Futschik,et al.  Approximate Bayesian computation for modular inference problems with many parameters: the example of migration rates , 2013, Molecular ecology.

[31]  Ryan D. Hernandez,et al.  A flexible forward simulator for populations subject to selection and demography , 2008, Bioinform..

[32]  Laurent Excoffier,et al.  Bayesian inference of the demographic history of chimpanzees. , 2010, Molecular biology and evolution.

[33]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[34]  Timothy B Sackton,et al.  Drosophila suzukii: The Genetic Footprint of a Recent, Worldwide Invasion , 2014, Molecular biology and evolution.

[35]  Daniel R. Caffrey,et al.  Influenza Virus Drug Resistance: A Time-Sampled Population Genetics Perspective , 2014, PLoS genetics.

[36]  Scott A. Sisson,et al.  Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model , 2015, 1504.04093.

[37]  Laurent Excoffier,et al.  ABCtoolbox: a versatile toolkit for approximate Bayesian computations , 2010, BMC Bioinformatics.

[38]  P. Freeman,et al.  Likelihood-Free Inference in Cosmology: Potential for the Estimation of Luminosity Functions , 2012 .

[39]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[40]  August E. Woerner,et al.  An early divergence of KhoeSan ancestors from those of other modern humans is supported by an ABC-based analysis of autosomal resequencing data. , 2012, Molecular biology and evolution.

[41]  Daniel Wegmann,et al.  Postglacial expansion and not human influence best explains the population structure in the endangered kea (Nestor notabilis) , 2014, Molecular ecology.

[42]  Orestis Malaspinas,et al.  Estimating Allele Age and Selection Coefficient from Time-Serial Data , 2012, Genetics.

[43]  R. Durrett Probability Models for DNA Sequence Evolution , 2002 .

[44]  Laura S Kubatko,et al.  Estimating species trees using approximate Bayesian computation. , 2011, Molecular phylogenetics and evolution.

[45]  Matthieu Foll,et al.  WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data , 2014, bioRxiv.