Bayesian parameter estimation for biochemical reaction networks using region-based adaptive parallel tempering

Motivation Mathematical models have become standard tools for the investigation of cellular processes and the unraveling of signal processing mechanisms. The parameters of these models are usually derived from the available data using optimization and sampling methods. However, the efficiency of these methods is limited by the properties of the mathematical model, e.g. non‐identifiabilities, and the resulting posterior distribution. In particular, multi‐modal distributions with long valleys or pronounced tails are difficult to optimize and sample. Thus, the developement or improvement of optimization and sampling methods is subject to ongoing research. Results We suggest a region‐based adaptive parallel tempering algorithm which adapts to the problem‐specific posterior distributions, i.e. modes and valleys. The algorithm combines several established algorithms to overcome their individual shortcomings and to improve sampling efficiency. We assessed its properties for established benchmark problems and two ordinary differential equation models of biochemical reaction networks. The proposed algorithm outperformed state‐of‐the‐art methods in terms of calculation efficiency and mixing. Since the algorithm does not rely on a specific problem structure, but adapts to the posterior distribution, it is suitable for a variety of model classes. Availability and implementation The code is available both as Supplementary Material and in a Git repository written in MATLAB.

[1]  Heikki Haario,et al.  DRAM: Efficient adaptive MCMC , 2006, Stat. Comput..

[2]  M. Peter,et al.  Scalable inference of heterogeneous reaction kinetics from pooled single-cell recordings , 2013, Nature Methods.

[3]  Peter A. J. Hilbers,et al.  An integrated strategy for prediction uncertainty analysis , 2012, Bioinform..

[4]  A. Mira,et al.  Parallel hierarchical sampling : a general-purpose class of multiple-chains MCMC algorithms , 2009 .

[5]  Yaning Yang,et al.  Statistical significance for hierarchical clustering in genetic association and microarray expression studies , 2003, BMC Bioinformatics.

[6]  Jan Hasenauer,et al.  Parallelization and High-Performance Computing Enables Automated Statistical Inference of Multi-scale Models. , 2017, Cell systems.

[7]  Eric Moulines,et al.  Adaptive parallel tempering algorithm , 2012, 1205.1076.

[8]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[9]  Özlem Türeci,et al.  Determinants of intracellular RNA pharmacokinetics: Implications for RNA-based immunotherapeutics , 2011, RNA biology.

[10]  Fabio Rigat,et al.  Parallel hierarchical sampling: A general-purpose interacting Markov chains Monte Carlo algorithm , 2012, Comput. Stat. Data Anal..

[11]  M. Girolami,et al.  Inferring Signaling Pathway Topologies from Multiple Perturbation Measurements of Specific Biochemical Species , 2010, Science Signaling.

[12]  Chao Yang,et al.  Learn From Thy Neighbor: Parallel-Chain and Regional Adaptive MCMC , 2009 .

[13]  Evgeny Levi,et al.  Adaptive Component-Wise Multiple-Try Metropolis Sampling , 2016 .

[14]  Jan Hasenauer,et al.  Robust parameter estimation for dynamical systems from outlier‐corrupted data , 2017, Bioinform..

[15]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[16]  Yan Bai,et al.  Divide and Conquer: A Mixture-Based Approach to Regional Adaptation for MCMC , 2009 .

[17]  J. Banga,et al.  Structural Identifiability of Systems Biology Models: A Critical Comparison of Methods , 2011, PloS one.

[18]  Fabian J Theis,et al.  High-dimensional Bayesian parameter estimation: case study for a model of JAK2/STAT5 signaling. , 2013, Mathematical biosciences.

[19]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[20]  Eva Balsa-Canto,et al.  GenSSI 2.0: multi-experiment structural identifiability analysis of SBML models , 2017, Bioinform..

[21]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[22]  J. Timmer,et al.  Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by databased modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Marisa C Eisenberg,et al.  Determining identifiable parameter combinations using subset profiling. , 2014, Mathematical biosciences.

[24]  Gareth O. Roberts,et al.  Examples of Adaptive MCMC , 2009 .

[25]  Radu Serban,et al.  User Documentation for CVODES: An ODE Solver with Sensitivity Analysis Capabilities , 2002 .

[26]  Jan Hasenauer,et al.  PESTO: Parameter EStimation TOolbox , 2017, Bioinform..

[27]  Jens Timmer,et al.  Joining forces of Bayesian and frequentist methodology: a study for inference in the presence of non-identifiability , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[28]  Babak Shahbaba,et al.  Wormhole Hamiltonian Monte Carlo , 2014, AAAI.

[29]  Benedek Kovács,et al.  Parameter estimation of dynamical systems , 2011 .

[30]  I. Mandel,et al.  Dynamic temperature selection for parallel tempering in Markov chain Monte Carlo simulations , 2015, 1501.05823.

[31]  Stephen M. Krone,et al.  Small-world MCMC and convergence to multi-modal distributions: From slow mixing to fast mixing , 2007 .

[32]  J. Timmer,et al.  Division of labor by dual feedback regulators controls JAK2/STAT5 signaling over broad ligand range , 2011, Molecular systems biology.

[33]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[34]  Fabian J. Theis,et al.  Parameter estimation for dynamical systems with discrete events and logical operations , 2016, Bioinform..

[35]  Fabian J. Theis,et al.  Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems , 2017, BMC Systems Biology.

[36]  Thomas S. Ligon,et al.  Single-cell mRNA transfection studies: delivery, kinetics and statistics by numbers. , 2014, Nanomedicine : nanotechnology, biology, and medicine.

[37]  Fabian J. Theis,et al.  Uncertainty Analysis for Non-identifiable Dynamical Systems: Profile Likelihoods, Bootstrapping and More , 2014, CMSB.

[38]  Darren J. Wilkinson,et al.  Bayesian methods in bioinformatics and computational systems biology , 2006, Briefings Bioinform..

[39]  Malcolm Sambridge,et al.  A Parallel Tempering algorithm for probabilistic sampling and multimodal optimization , 2014 .

[40]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[41]  Ivan Gesteira Costa Filho Mixture Models for the Analysis of Gene Expression , 2008 .

[42]  Joachim M. Buhmann,et al.  Optimized expected information gain for nonlinear dynamical systems , 2009, ICML '09.

[43]  J. Rosenthal,et al.  Coupling and Ergodicity of Adaptive Markov Chain Monte Carlo Algorithms , 2007, Journal of Applied Probability.

[44]  Mateusz Krzysztof Lacki,et al.  State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm , 2016, Stat. Comput..