A practical guide to pseudo-marginal methods for computational inference in systems biology.

For many stochastic models of interest in systems biology, such as those describing biochemical reaction networks, exact quantification of parameter uncertainty through statistical inference is intractable. Likelihood-free computational inference techniques enable parameter inference when the likelihood function for the model is intractable but the generation of many sample paths is feasible through stochastic simulation of the forward problem. The most common likelihood-free method in systems biology is approximate Bayesian computation that accepts parameters that result in low discrepancy between stochastic simulations and measured data. However, it can be difficult to assess how the accuracy of the resulting inferences are affected by the choice of acceptance threshold and discrepancy function. The pseudo-marginal approach is an alternative likelihood-free inference method that utilises a Monte Carlo estimate of the likelihood function. This approach has several advantages, particularly in the context of noisy, partially observed, time-course data typical in biochemical reaction network studies. Specifically, the pseudo-marginal approach facilitates exact inference and uncertainty quantification, and may be efficiently combined with particle filters for low variance, high-accuracy likelihood estimation. In this review, we provide a practical introduction to the pseudo-marginal approach using inference for biochemical reaction networks as a series of case studies. Implementations of key algorithms and examples are provided using the Julia programming language; a high performance, open source programming language for scientific computing (https://github.com/davidwarne/Warne2019_GuideToPseudoMarginal).

[1]  D. Wilkinson Stochastic modelling for quantitative description of heterogeneous biological systems , 2009, Nature Reviews Genetics.

[2]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[3]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[4]  T. Elston,et al.  Stochasticity in gene expression: from theories to phenotypes , 2005, Nature Reviews Genetics.

[5]  Eric Mjolsness,et al.  Measuring single-cell gene expression dynamics in bacteria using fluorescence time-lapse microscopy , 2011, Nature Protocols.

[6]  Matthew J. Simpson,et al.  Inferring parameters for a lattice-free model of cell migration and proliferation using experimental data , 2017, bioRxiv.

[7]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[8]  Christian P. Robert,et al.  Bayesian computation: a summary of the current state, and samples backwards and forwards , 2015, Statistics and Computing.

[9]  Radek Erban,et al.  Error Analysis of Diffusion Approximation Methods for Multiscale Systems in Reaction Kinetics , 2014, SIAM J. Sci. Comput..

[10]  T. J. Dodwell,et al.  A Hierarchical Multilevel Markov Chain Monte Carlo Algorithm with Applications to Uncertainty Quantification in Subsurface Flow , 2013, SIAM/ASA J. Uncertain. Quantification.

[11]  Robert Scheichl,et al.  Multilevel Markov Chain Monte Carlo , 2019, SIAM Rev..

[12]  Jukka Corander,et al.  Approximate Bayesian Computation , 2013, PLoS Comput. Biol..

[13]  S. Hell,et al.  Fluorescence nanoscopy in cell biology , 2017, Nature Reviews Molecular Cell Biology.

[14]  J. Møller Discussion on the paper by Feranhead and Prangle , 2012 .

[15]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[16]  P. Fearnhead,et al.  Improved particle filter for nonlinear problems , 1999 .

[17]  Stefan Hellander,et al.  Convergence of methods for coupling of microscopic and mesoscopic reaction-diffusion simulations , 2013, J. Comput. Phys..

[18]  Hong Qian,et al.  Stochastic dynamics and non-equilibrium thermodynamics of a bistable chemical system: the Schlögl model revisited , 2009, Journal of The Royal Society Interface.

[19]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[21]  P. Kloeden,et al.  Numerical Solution of Stochastic Differential Equations , 1992 .

[22]  G. Maruyama Continuous Markov processes and stochastic equations , 1955 .

[23]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[24]  Darren J. Wilkinson,et al.  Parameter inference for stochastic kinetic models of bacterial gene regulation: A Bayesian Approach to Systems Biology , 2011 .

[25]  Mireille Bossy,et al.  A symmetrized Euler scheme for an efficient approximation of reflected diffusions , 2004, Journal of Applied Probability.

[26]  D. Gillespie The chemical Langevin equation , 2000 .

[27]  Gareth O. Roberts,et al.  Examples of Adaptive MCMC , 2009 .

[28]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[29]  Darren J Wilkinson,et al.  Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo , 2011, Interface Focus.

[30]  A. Millar,et al.  The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops , 2012, Molecular systems biology.

[31]  T. Ohtsuka,et al.  Oscillations in notch signaling regulate maintenance of neural progenitors , 2008, International Journal of Developmental Neuroscience.

[32]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[33]  James C. W. Locke,et al.  Using movies to analyse gene circuit dynamics in single cells , 2009, Nature Reviews Microbiology.

[34]  M. Elowitz,et al.  A synthetic oscillatory network of transcriptional regulators , 2000, Nature.

[35]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[36]  Kevin Burrage,et al.  Modeling ion channel dynamics through reflected stochastic differential equations. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Christophe Andrieu,et al.  Theoretical and methodological aspects of MCMC computations with noisy likelihoods , 2018 .

[38]  Paul Fearnhead,et al.  Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC , 2010, 1004.1112.

[39]  M. Ehrenberg,et al.  Stochastic focusing: fluctuation-enhanced sensitivity of intracellular regulation. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Matthew J Simpson,et al.  Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to state-of-the-art , 2018, Journal of the Royal Society Interface.

[41]  K. Burrage,et al.  Stochastic models for regulatory networks of the genetic toggle switch. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Darren J. Wilkinson,et al.  Bayesian inference for nonlinear multivariate diffusion models observed with error , 2008, Comput. Stat. Data Anal..

[43]  Ruth E Baker,et al.  Approximate Bayesian computation reveals the importance of repeated measurements for parameterising cell-based models of growing tissues. , 2018, Journal of theoretical biology.

[44]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[45]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[46]  Dan Li,et al.  Efficient Bayesian estimation for GARCH-type models via Sequential Monte Carlo , 2019, Econometrics and Statistics.

[47]  Carlos E. Rodríguez,et al.  Searching for efficient Markov chain Monte Carlo proposal kernels , 2013, Proceedings of the National Academy of Sciences.

[48]  Yanan Fan,et al.  Handbook of Approximate Bayesian Computation , 2018 .

[49]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[50]  T. Kurtz The Relationship between Stochastic and Deterministic Models for Chemical Reactions , 1972 .

[51]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[52]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.

[53]  P. Hewson Bayesian Data Analysis 3rd edn A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari and D. B. Rubin, 2013 Boca Raton, Chapman and Hall–CRC 676 pp., £44.99 ISBN 1‐439‐84095‐4 , 2015 .

[54]  Guido Sanguinetti,et al.  The complex chemical Langevin equation. , 2014, The Journal of chemical physics.

[55]  Dahua Lin,et al.  Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem , 2019, J. Stat. Softw..

[56]  R. Tweedie,et al.  Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[57]  F. Schlögl Chemical reaction models for non-equilibrium phase transitions , 1972 .

[58]  G. Roberts,et al.  MCMC Methods for Functions: ModifyingOld Algorithms to Make Them Faster , 2012, 1202.0709.

[59]  Ramon Grima,et al.  Approximation and inference methods for stochastic biochemical kinetics—a tutorial review , 2016, 1608.06582.

[60]  Desmond J. Higham,et al.  Modeling and Simulating Chemical Reactions , 2008, SIAM Rev..

[61]  Keegan E. Hines,et al.  Determination of parameter identifiability in nonlinear biophysical models: A Bayesian approach , 2014, The Journal of general physiology.

[62]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[63]  G. Vinnicombe,et al.  Synchronous long-term oscillations in a synthetic gene circuit , 2016, Nature.

[64]  Desmond J. Higham,et al.  On Constrained Langevin Equations and (Bio)Chemical Reaction Networks , 2019, Multiscale Model. Simul..

[65]  Desmond J. Higham,et al.  An Algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations , 2001, SIAM Rev..

[66]  Yan Zhou,et al.  Multilevel Particle Filters , 2015, SIAM J. Numer. Anal..

[67]  A. Oudenaarden,et al.  Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences , 2008, Cell.

[68]  Ruth E. Baker,et al.  Practical parameter identifiability for spatio-temporal models of cell invasion , 2020, Journal of the Royal Society Interface.

[69]  Robert Kohn,et al.  Particle Methods for Stochastic Differential Equation Mixed Effects Models , 2019, 1907.11017.

[70]  A. Doucet,et al.  Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator , 2012, 1210.1871.

[71]  G. Marion,et al.  Using model-based proposals for fast parameter inference on discrete state space, continuous-time Markov processes , 2015, Journal of The Royal Society Interface.

[72]  Aki Vehtari,et al.  Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion) , 2019, Bayesian Analysis.

[73]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[74]  C. Rao,et al.  Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the Gillespie algorithm , 2003 .

[75]  Darren J. Wilkinson Stochastic Modelling for Systems Biology , 2006 .

[76]  Yan Zhou,et al.  Bayesian Static Parameter Estimation for Partially Observed Diffusions via Multilevel Monte Carlo , 2017, SIAM J. Sci. Comput..

[77]  Matthew J Simpson,et al.  Quantifying the effect of experimental design choices for in vitro scratch assays. , 2016, Journal of theoretical biology.

[78]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[79]  M. Beaumont Estimation of population growth or decline in genetically monitored populations. , 2003, Genetics.

[80]  Andrew J. Millar,et al.  Reconstruction of transcriptional dynamics from gene reporter data using differential equations , 2008, Bioinform..

[81]  Matthew J Simpson,et al.  Using Experimental Data and Information Criteria to Guide Model Selection for Reaction–Diffusion Problems in Mathematical Biology , 2018, bioRxiv.

[82]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[83]  I. Simon,et al.  Studying and modelling dynamic biological processes using time-series gene expression data , 2012, Nature Reviews Genetics.

[84]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[85]  B. Carlin,et al.  Diagnostics: A Comparative Review , 2022 .

[86]  A. GREGORY,et al.  Multilevel Ensemble Transform Particle Filtering , 2015, SIAM J. Sci. Comput..

[87]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[88]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[89]  Y. Wong,et al.  Positivity preserving chemical Langevin equations , 2008 .

[90]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .