Simulation-based model selection for dynamical systems in systems and population biology

Motivation: Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforward. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of, e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. Results: Here, we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable. Contact: ttoni@imperial.ac.uk; m.stumpf@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  S. Prusiner Novel proteinaceous infectious particles cause scrapie. , 1982, Science.

[2]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[3]  D. Lauffenburger,et al.  Input–output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data , 2009, Molecular systems biology.

[4]  D. Balding,et al.  Analyses of infectious disease data from household outbreaks by Markov chain Monte Carlo methods , 2000 .

[5]  J. Darnell STATs and gene regulation. , 1997, Science.

[6]  L. M. M.-T. Theory of Probability , 1929, Nature.

[7]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[8]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[9]  G. W. Snedecor Statistical Methods , 1964 .

[10]  Christoph Leuenberger Daniel Wegmann Laurent Excoffier Bayesian Computation and Model Selection in Population Genetics , 2009, 0901.2231.

[11]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[12]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[13]  Ursula Klingmüller,et al.  Modeling the Nonlinear Dynamics of Cellular Signal Transduction , 2004, Int. J. Bifurc. Chaos.

[14]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[15]  J. Timmer,et al.  Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by databased modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Tina Toni,et al.  The ABC of reverse engineering biological signalling systems. , 2009, Molecular bioSystems.

[17]  Zhike Zi,et al.  SBML-PET: a Systems Biology Markup Language-based parameter estimation tool , 2006, Bioinform..

[18]  Ursula Klingmüller,et al.  Tests for cycling in a signalling pathway , 2004 .

[19]  M. Eigen,et al.  Prionics or the kinetic basis of prion diseases. , 1996, Biophysical chemistry.

[20]  C. Robert,et al.  ABC likelihood-free methods for model choice in Gibbs random fields , 2008, 0807.2767.

[21]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[22]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[23]  Tom Maniatis,et al.  Regulation of Interferon-γ-Activated STAT1 by the Ubiquitin-Proteasome Pathway , 1996, Science.

[24]  C. Horvath,et al.  STAT proteins and transcriptional responses to extracellular signals. , 2000, Trends in biochemical sciences.

[25]  Christophe Andrieu,et al.  Model criticism based on likelihood-free inference, with an application to protein network evolution , 2009, Proceedings of the National Academy of Sciences.

[26]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[27]  I. Longini,et al.  Household and community transmission parameters from final distributions of infections in households. , 1982, Biometrics.

[28]  I. Longini,et al.  A generalized stochastic model for the analysis of infectious disease final size data. , 1991, Biometrics.

[29]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[30]  T. Maniatis,et al.  Regulation of interferon-gamma-activated STAT1 by the ubiquitin-proteasome pathway. , 1996, Science.

[31]  H. Hauser,et al.  Dynamic redistribution of STAT1 protein in IFN signaling visualized by GFP fusion proteins. , 1999, European journal of biochemistry.

[32]  Mark A. Girolami,et al.  Bayesian ranking of biochemical system models , 2008, Bioinform..

[33]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[34]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[35]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[36]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[37]  Maliha S. Nash,et al.  Spatial Statistics and Computational Methods , 2004, Technometrics.

[38]  Tina Toni,et al.  Parameter inference for biochemical systems that undergo a Hopf bifurcation. , 2008, Biophysical journal.

[39]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Christopher R. Myers,et al.  Universally Sloppy Parameter Sensitivities in Systems Biology Models , 2007, PLoS Comput. Biol..

[41]  M. Zeidler,et al.  JAK/STAT signalling in Drosophila: insights into conserved regulatory and cellular functions , 2006, Development.

[42]  Carsten Wiuf,et al.  Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum , 2007, PLoS Comput. Biol..

[43]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[44]  Michael P. H. Stumpf,et al.  Tutorial on ABC rejection and ABC SMC for parameter estimation and model selection , 2009, 0910.4472.

[45]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[46]  R. May Uses and Abuses of Mathematics in Biology , 2004, Science.

[47]  Damian Clancy,et al.  Exact Bayesian Inference and Model Selection for Stochastic Models of Epidemics Among a Community of Households , 2007 .

[48]  U Klingmüller,et al.  Multiple tyrosine residues in the cytosolic domain of the erythropoietin receptor promote activation of STAT5. , 1996, Proceedings of the National Academy of Sciences of the United States of America.