Approximate Bayesian inference in semi-mechanistic models

Inference of interaction networks represented by systems of differential equations is a challenging problem in many scientific disciplines. In the present article, we follow a semi-mechanistic modelling approach based on gradient matching. We investigate the extent to which key factors, including the kinetic model, statistical formulation and numerical methods, impact upon performance at network reconstruction. We emphasize general lessons for computational statisticians when faced with the challenge of model selection, and we assess the accuracy of various alternative paradigms, including recent widely applicable information criteria and different numerical procedures for approximating Bayes factors. We conduct the comparative evaluation with a novel inferential pipeline that systematically disambiguates confounding factors via an ANOVA scheme.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  Frank Bretz,et al.  Comparison of Methods for the Computation of Multivariate t Probabilities , 2002 .

[3]  David Higdon,et al.  Gaussian Process Modeling of Derivative Curves , 2013, Technometrics.

[4]  Joe W. Gray,et al.  Causal network inference using biochemical kinetics , 2014, Bioinform..

[5]  Andrew J. Millar,et al.  Modelling the widespread effects of TOC1 signalling on the plant circadian clock and its outputs , 2013, BMC Systems Biology.

[6]  Elizabeth Bradley Analysis of time series , 2003 .

[7]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[8]  D. A. Baxter,et al.  Modeling transcriptional control in gene networks—methods, recent results, and future directions , 2000, Bulletin of mathematical biology.

[9]  Connor W. McEntee,et al.  The DIURNAL project: DIURNAL and circadian expression profiling, model-based pattern matching, and promoter analysis. , 2007, Cold Spring Harbor symposia on quantitative biology.

[10]  Yoshihiro Yamanishi,et al.  Comprehensive Analysis of Delay in Transcriptional Regulation Using Expression Profiles , 2003 .

[11]  Hong Chang,et al.  Model Determination Using Predictive Distributions with Implementation via Sampling-Based Methods , 1992 .

[12]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[13]  Nial Friel,et al.  Improving power posterior estimation of statistical evidence , 2012, Stat. Comput..

[14]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Evidence Evaluation , 2016 .

[15]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[16]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[17]  M. Grzegorczyk,et al.  Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles , 2015, Statistical applications in genetics and molecular biology.

[18]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.

[19]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[20]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[21]  José G. Ramcrez Data Analysis: Statistical and Computational Methods for Scientists and Engineers , 2000, Technometrics.

[22]  Terence P. Speed Learning and Inference in Computational Systems Biology. Computational Molecular Biology. Edited by Neil D. Lawrence, Mark Girolami, Magnus Rattray, and Guido Sanguinetti. Cambridge (Massachusetts): MIT Press. $40.00. viii + 362 p.; ill.; index. ISBN: 978-0-262-01386-4. 2010. , 2012 .

[23]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[24]  M. Stitt,et al.  Defining the robust behaviour of the plant clock gene circuit with absolute RNA timeseries and open infrastructure , 2015, Open Biology.

[25]  Alan Genz,et al.  Numerical computation of rectangular bivariate and trivariate normal and t probabilities , 2004, Stat. Comput..

[26]  M. Barenco,et al.  Ranked prediction of p53 targets using hidden variable dynamic modeling , 2006, Genome Biology.

[27]  Takeshi Mizuno,et al.  Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model , 2010, Molecular Systems Biology.

[28]  Mark A. Girolami,et al.  Bayesian ranking of biochemical system models , 2008, Bioinform..

[29]  Sumio Watanabe,et al.  A widely applicable Bayesian information criterion , 2012, J. Mach. Learn. Res..

[30]  Jiguo Cao,et al.  Parameter estimation for differential equations: a generalized smoothing approach , 2007 .

[31]  Michael P H Stumpf,et al.  Topological sensitivity analysis for systems biology , 2014, Proceedings of the National Academy of Sciences.

[32]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[33]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[34]  J. Hillston,et al.  Stochastic properties of the plant circadian clock , 2012, Journal of The Royal Society Interface.

[35]  Neil D. Lawrence,et al.  Learning and Inference in Computational Systems Biology , 2010, Computational molecular biology.

[36]  Jean-Michel Marin,et al.  Bayesian Core: A Practical Approach to Computational Bayesian Statistics , 2010 .

[37]  Geoffrey E. Hinton,et al.  The delve manual , 1996 .

[38]  D. Lindley A STATISTICAL PARADOX , 1957 .

[39]  Bartek Wilczynski,et al.  Active enhancer positions can be accurately predicted from chromatin marks and collective sequence motif data , 2013, BMC Systems Biology.

[40]  A. Genz,et al.  Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts , 1999 .

[41]  G. O. Wesolowsky,et al.  On the computation of the bivariate normal integral , 1990 .

[42]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[43]  A. Millar,et al.  The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops , 2012, Molecular systems biology.

[44]  Aki Vehtari,et al.  GPstuff: Bayesian modeling with Gaussian processes , 2013, J. Mach. Learn. Res..

[45]  Mahnaz Ghaedrahmati,et al.  Study of Grain Yield Stability of Durum Wheat Genotypes using AMMI , 2017 .

[46]  Han Lin Shang,et al.  The BUGS book: a practical introduction to Bayesian analysis , 2013 .

[47]  Jane Hillston,et al.  Bio-PEPA: A framework for the modelling and analysis of biological systems , 2009, Theor. Comput. Sci..

[48]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[49]  M. Grzegorczyk,et al.  Statistical inference of regulatory networks for circadian regulation , 2014, Statistical applications in genetics and molecular biology.

[50]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[51]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[52]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[53]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[54]  Carl Troein,et al.  Rethinking Transcriptional Activation in the Arabidopsis Circadian Clock , 2014, PLoS Comput. Biol..

[55]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.