Rapid Bayesian Inference for Expensive Stochastic Models

Almost all fields of science rely upon statistical inference to estimate unknown parameters in theoretical and computational models. While the performance of modern computer hardware continues to grow, the computational requirements for the simulation of models are growing even faster. This is largely due to the increase in model complexity, often including stochastic dynamics, that is necessary to describe and characterize phenomena observed using modern, high resolution, experimental techniques. Such models are rarely analytically tractable, meaning that extremely large numbers of stochastic simulations are required for parameter inference. In such cases, parameter inference can be practically impossible. In this work, we present new computational Bayesian techniques that accelerate inference for expensive stochastic models by using computationally inexpensive approximations to inform feasible regions in parameter space, and through learning transforms that adjust the biased approximate inferences to closer represent the correct inferences under the expensive stochastic model. Using topical examples from ecology and cell biology, we demonstrate a speed improvement of an order of magnitude without any loss in accuracy. This represents a substantial improvement over current state-of-the-art methods for Bayesian computations when appropriate model approximations are available.

[1]  D. Wilkinson Stochastic modelling for quantitative description of heterogeneous biological systems , 2009, Nature Reviews Genetics.

[2]  Michael P. H. Stumpf,et al.  Considerate approaches to constructing summary statistics for ABC model selection , 2012, Statistics and Computing.

[3]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[4]  Anthony Lee,et al.  Accelerating Metropolis-Hastings algorithms by Delayed Acceptance , 2015, Foundations of Data Science.

[5]  Yang Cao,et al.  Sensitivity analysis of discrete stochastic systems. , 2005, Biophysical journal.

[6]  Matthew J Simpson,et al.  Stochastic simulation tools and continuum models for describing two-dimensional collective cell spreading with universal growth functions. , 2016, Physical biology.

[7]  David Balding,et al.  Identification of the remains of King Richard III , 2014, Nature Communications.

[8]  R E Baker,et al.  Efficient parameter sensitivity computation for spatially extended reaction networks. , 2017, The Journal of chemical physics.

[9]  James D. Murray Mathematical Biology: I. An Introduction , 2007 .

[10]  A. McKane,et al.  Stochastic formulation of ecological models and their applications. , 2012, Trends in ecology & evolution.

[11]  Matthew J Simpson,et al.  Using Experimental Data and Information Criteria to Guide Model Selection for Reaction–Diffusion Problems in Mathematical Biology , 2018, bioRxiv.

[12]  Leah Edelstein-Keshet,et al.  Mathematical models in biology , 2005, Classics in applied mathematics.

[13]  Julien Cornebise,et al.  On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo , 2011, Statistical applications in genetics and molecular biology.

[14]  Juha Heinanen,et al.  OF DATA INTENSIVE APPLICATIONS , 1986 .

[15]  Matthew J. Simpson,et al.  Cell invasion with proliferation mechanisms motivated bytime-lapse data , 2010 .

[16]  Thomas Callaghan,et al.  A Stochastic Model for Wound Healing , 2005, q-bio/0507035.

[17]  E. Fehlberg,et al.  Low-order classical Runge-Kutta formulas with stepsize control and their application to some heat transfer problems , 1969 .

[18]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[19]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[20]  Peter J. Bickel,et al.  A Moment Matching Ensemble Filter for Nonlinear Non-Gaussian Data Assimilation , 2011 .

[21]  Matthew J Simpson,et al.  A Bayesian Sequential Learning Framework to Parameterise Continuum Models of Melanoma Invasion into Human Skin , 2019, Bulletin of mathematical biology.

[22]  Mark K Transtrum,et al.  Model reduction by manifold boundaries. , 2014, Physical review letters.

[23]  Nancy Knowlton,et al.  Formation of the Isthmus of Panama , 2016, Science Advances.

[24]  Anthony N. Pettitt,et al.  Melanoma Cell Colony Expansion Parameters Revealed by Approximate Bayesian Computation , 2015, PLoS Comput. Biol..

[25]  Christopher Lester,et al.  Multi-level Approximate Bayesian Computation , 2018, bioRxiv.

[26]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[27]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[28]  Yan Wang,et al.  Persistence and extinction of population in reaction–diffusion–advection model with strong Allee effect growth , 2019, Journal of Mathematical Biology.

[29]  Aaron M. Ellison,et al.  Bayesian inference in ecology , 2004 .

[30]  H. Pollitt,et al.  Climate–carbon cycle uncertainties and the Paris Agreement , 2018, Nature Climate Change.

[31]  Ruth E Baker,et al.  Multifidelity Approximate Bayesian Computation , 2018, SIAM/ASA J. Uncertain. Quantification.

[32]  E. Dougherty,et al.  Big data need big theory too , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[33]  Andreas Deutsch,et al.  An Emerging Allee Effect Is Critical for Tumor Initiation and Persistence , 2015, PLoS Comput. Biol..

[34]  Yanan Fan,et al.  Handbook of Approximate Bayesian Computation , 2018 .

[35]  Simon Cotter,et al.  Surrogate accelerated Bayesian inversion for the determination of the thermal diffusivity of a material , 2018, Metrologia.

[36]  D. J. Nott,et al.  Approximate Bayesian Computation and Bayes’ Linear Analysis: Toward High-Dimensional ABC , 2011, 1112.4755.

[37]  C. Liang,et al.  In vitro scratch assay: a convenient and inexpensive method for analysis of cell migration in vitro , 2007, Nature Protocols.

[38]  Raul Tempone,et al.  Multilevel Monte Carlo in approximate Bayesian computation , 2017, Stochastic Analysis and Applications.

[39]  Ruth E. Baker,et al.  Co-operation, Competition and Crowding: A Discrete Framework Linking Allee Kinetics, Nonlinear Diffusion, Shocks and Sharp-Fronted Travelling Waves , 2016, Scientific Reports.

[40]  Matthew J. Simpson,et al.  Spatial structure arising from neighbour-dependent bias in collective cell movement , 2016, PeerJ.

[41]  Brenda N. Vo,et al.  Quantifying uncertainty in parameter estimates for stochastic models of collective cell spreading using approximate Bayesian computation. , 2015, Mathematical biosciences.

[42]  R. Baker,et al.  Mechanistic models versus machine learning, a fight worth fighting for the biological community? , 2018, Biology Letters.

[43]  Chris P. Barnes,et al.  Mechanistic Modelling and Bayesian Inference Elucidates the Variable Dynamics of Double-Strand Break Repair , 2016, bioRxiv.

[44]  Daniel C. M. Palumbo,et al.  First M87 Event Horizon Telescope Results. VI. The Shadow and Mass of the Central Black Hole , 2019, The Astrophysical Journal.

[45]  Matthew J Simpson,et al.  Lattice-free descriptions of collective motion with crowding and adhesion. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Esha T. Shah,et al.  Logistic Proliferation of Cells in Scratch Assays is Delayed , 2017, Bulletin of Mathematical Biology.

[47]  Thomas P. Witelski Merging traveling waves for the porous-Fisher's equation☆ , 1995 .

[48]  Desmond J. Higham,et al.  Modeling and Simulating Chemical Reactions , 2008, SIAM Rev..

[50]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[51]  Matthew J Simpson,et al.  Coalescence of interacting cell populations. , 2007, Journal of theoretical biology.

[52]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[53]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[54]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[55]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[56]  G. Maruyama Continuous Markov processes and stochastic equations , 1955 .

[57]  Radek Erban,et al.  Tensor methods for parameter estimation and bifurcation analysis of stochastic reaction networks , 2015, Journal of The Royal Society Interface.

[58]  Richard G. Everitt,et al.  Delayed Acceptance ABC-SMC , 2017, J. Comput. Graph. Stat..

[59]  Michael P.H. Stumpf,et al.  Approximate Bayesian inference for complex ecosystems , 2014, F1000prime reports.

[60]  Alessandro Vespignani,et al.  Measurability of the epidemic reproduction number in data-driven contact networks , 2018, Proceedings of the National Academy of Sciences.

[61]  Ruth E. Baker,et al.  Multilevel rejection sampling for approximate Bayesian computation , 2017, Comput. Stat. Data Anal..

[62]  A. Iserles A First Course in the Numerical Analysis of Differential Equations: Gaussian elimination for sparse linear equations , 2008 .

[63]  A. Tsoularis,et al.  Analysis of logistic growth models. , 2002, Mathematical biosciences.

[64]  Andrew Golightly,et al.  Delayed acceptance particle MCMC for exact inference in stochastic kinetic models , 2014, Stat. Comput..

[65]  Youssef Marzouk,et al.  Transport Map Accelerated Markov Chain Monte Carlo , 2014, SIAM/ASA J. Uncertain. Quantification.

[66]  R. Grima,et al.  Linear mapping approximation of gene regulatory networks with stochastic dynamics , 2018, Nature Communications.

[67]  John Lygeros,et al.  Moment-Based Methods for Parameter Inference and Experiment Design for Stochastic Biochemical Reaction Networks , 2015, ACM Trans. Model. Comput. Simul..

[68]  D. L. Sean McElwain,et al.  Interpreting scratch assays using pair density dynamics and approximate Bayesian computation , 2014, Open Biology.

[69]  Matthew J Simpson,et al.  Optimal Quantification of Contact Inhibition in Cell Populations. , 2017, Biophysical journal.

[70]  J. Møller Discussion on the paper by Feranhead and Prangle , 2012 .

[71]  David A. Campbell,et al.  Transdimensional approximate Bayesian computation for inference on invasive species models with latent variables of unknown dimension , 2015, Comput. Stat. Data Anal..

[72]  P. Mazur On the theory of brownian motion , 1959 .

[73]  Andrew M. Stuart,et al.  Approximation of Bayesian Inverse Problems for PDEs , 2009, SIAM J. Numer. Anal..

[74]  H. Byrne,et al.  Mathematical Biology , 2002 .

[75]  D. Kirschner,et al.  A methodology for performing global uncertainty and sensitivity analysis in systems biology. , 2008, Journal of theoretical biology.

[76]  Søren Brunak,et al.  A genomic history of Aboriginal Australia , 2016, Nature.

[77]  S. McCue,et al.  A Bayesian Computational Approach to Explore the Optimal Duration of a Cell Proliferation Assay , 2017, Bulletin of Mathematical Biology.

[78]  N. Rashevsky,et al.  Mathematical biology , 1961, Connecticut medicine.

[79]  Dennis Prangle,et al.  Lazy ABC , 2014, Stat. Comput..

[80]  J. Lygeros,et al.  Moment-based inference predicts bimodality in transient gene expression , 2012, Proceedings of the National Academy of Sciences.

[81]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[82]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[83]  R. Baker,et al.  A practical guide to pseudo-marginal methods for computational inference in systems biology. , 2019, Journal of theoretical biology.

[84]  Danielle J. Marceau,et al.  The role of agent-based models in wildlife ecology and management , 2011 .

[85]  M. Guindani,et al.  Filtering and Estimation for a Class of Stochastic Volatility Models with Intractable Likelihoods , 2019, Bayesian Analysis.

[86]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[87]  S. Sloan,et al.  BIOT CONSOLIDATION ANALYSIS WITH AUTOMATIC TIME STEPPING AND ERROR CONTROL PART 1: THEORY AND IMPLEMENTATION , 1999 .

[88]  M J Simpson,et al.  Distinguishing between mechanisms of cell aggregation using pair-correlation functions. , 2014, Journal of theoretical biology.

[89]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[90]  Brian Drawert,et al.  Using stochastic epidemiological models to evaluate conservation strategies for endangered amphibians , 2017, Journal of The Royal Society Interface.

[91]  Alan Hastings,et al.  Allee effects in biological invasions , 2005 .

[92]  Matthew J Simpson,et al.  Experimental and Modelling Investigation of Monolayer Development with Clustering , 2013, Bulletin of mathematical biology.

[93]  E. Meron,et al.  Diversity of vegetation patterns and desertification. , 2001, Physical review letters.

[94]  Mustafa Khammash,et al.  Parameter Estimation and Model Selection in Computational Biology , 2010, PLoS Comput. Biol..

[95]  Identifiability analysis for stochastic differential equation models in systems biology , 2020, Journal of the Royal Society Interface.

[96]  Matthew J Simpson,et al.  Inferring parameters for a lattice-free model of cell migration and proliferation using experimental data , 2017, bioRxiv.

[97]  Edward A. Codling,et al.  Random walk models in biology , 2008, Journal of The Royal Society Interface.

[98]  Michael B. Giles,et al.  Multilevel Monte Carlo methods , 2013, Acta Numerica.

[99]  D. DeAngelis,et al.  Individual-based models in ecology after four decades , 2014, F1000prime reports.

[100]  U. Dieckmann,et al.  POPULATION GROWTH IN SPACE AND TIME: SPATIAL LOGISTIC EQUATIONS , 2003 .

[101]  M. Rietkerk,et al.  Spatial Self-Organization of Vegetation Subject to Climatic Stress—Insights from a System Dynamics—Individual-Based Hybrid Model , 2016, Front. Plant Sci..

[102]  Noah A Rosenberg,et al.  AABC: approximate approximate Bayesian computation for inference in population-genetic models. , 2015, Theoretical population biology.

[103]  Anthony Lee,et al.  Accelerating sequential Monte Carlo with surrogate likelihoods , 2020, Statistics and Computing.

[104]  Kevin Burrage,et al.  Unlocking data sets by calibrating populations of models to data density: A study in atrial electrophysiology , 2017, Science Advances.

[105]  C C Drovandi,et al.  Estimation of Parameters for Macroparasite Population Evolution Using Approximate Bayesian Computation , 2011, Biometrics.

[106]  Jeff Hecht,et al.  Event horizon , 2011, Nature.

[107]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[108]  Matthew J Simpson,et al.  Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to state-of-the-art , 2018, Journal of the Royal Society Interface.

[109]  Peter Müller,et al.  A Bayesian semiparametric approach for the differential analysis of sequence counts data , 2014, Journal of the Royal Statistical Society. Series C, Applied statistics.