Bayesian ranking of biochemical system models

MOTIVATION There often are many alternative models of a biochemical system. Distinguishing models and finding the most suitable ones is an important challenge in Systems Biology, as such model ranking, by experimental evidence, will help to judge the support of the working hypotheses forming each model. Bayes factors are employed as a measure of evidential preference for one model over another. Marginal likelihood is a key component of Bayes factors, however computing the marginal likelihood is a difficult problem, as it involves integration of nonlinear functions in multidimensional space. There are a number of methods available to compute the marginal likelihood approximately. A detailed investigation of such methods is required to find ones that perform appropriately for biochemical modelling. RESULTS We assess four methods for estimation of the marginal likelihoods required for computing Bayes factors. The Prior Arithmetic Mean estimator, the Posterior Harmonic Mean estimator, the Annealed Importance Sampling and the Annealing-Melting Integration methods are investigated and compared on a typical case study in Systems Biology. This allows us to understand the stability of the analysis results and make reliable judgements in uncertain context. We investigate the variance of Bayes factor estimates, and highlight the stability of the Annealed Importance Sampling and the Annealing-Melting Integration methods for the purposes of comparing nonlinear models. AVAILABILITY Models used in this study are available in SBML format as the supplementary material to this article.

[1]  Aki Vehtari Discussion to "Bayesian measures of model complexity and fit" by Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A. , 2002 .

[2]  S. Goodman Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.

[3]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[4]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[5]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[6]  Zhilin Qu,et al.  Signal transduction network motifs and biological memory. , 2007, Journal of theoretical biology.

[7]  K. H. Lee,et al.  The statistical mechanics of complex signaling networks: nerve growth factor signaling , 2004, Physical biology.

[8]  Kirk E. Jordan,et al.  An assessment of the role of computing in systems biology , 2006, IBM J. Res. Dev..

[9]  J. Blenis,et al.  ERK and p38 MAPK-Activated Protein Kinases: a Family of Protein Kinases with Diverse Biological Functions , 2004, Microbiology and Molecular Biology Reviews.

[10]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[11]  Walter Kolch,et al.  Identification of the Mechanisms Regulating the Differential Activation of the MAPK Cascade by Epidermal Growth Factor and Nerve Growth Factor in PC12 Cells* , 2001, The Journal of Biological Chemistry.

[12]  Y. Ogata A Monte Carlo method for high dimensional integration , 1989 .

[13]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[14]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[15]  M. Elowitz,et al.  A synthetic oscillatory network of transcriptional regulators , 2000, Nature.

[16]  A. Raftery Choosing Models for Cross-Classifications , 1986 .

[17]  David A. Rand,et al.  Bayesian inference for dynamic transcriptional regulation; the Hes1 system as a case study , 2007, Bioinform..

[18]  Mudita Singhal,et al.  COPASI - a COmplex PAthway SImulator , 2006, Bioinform..

[19]  Christopher Holmes,et al.  Bayesian Methods for Nonlinear Classification and Regressing , 2002 .

[20]  A. Raftery,et al.  Estimating Bayes Factors via Posterior Simulation with the Laplace—Metropolis Estimator , 1997 .

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[23]  Peter E. Rossi,et al.  Bayes factors for nonlinear hypotheses and likelihood distributions , 1992 .

[24]  Kwang-Hyun Cho,et al.  Mathematical Modeling of the Influence of RKIP on the ERK Signaling Pathway , 2003, CMSB.

[25]  E. Gilles,et al.  Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors , 2002, Nature Biotechnology.

[26]  A. P. Dawid,et al.  Coherent Analysis of Forensic Identification Evidence , 1996 .

[27]  José M Bernardo and Adrian F M Smith,et al.  BAYESIAN THEORY , 2008 .

[28]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[29]  Avner Friedman,et al.  A continuum mathematical model of endothelial layer maintenance and senescence , 2007, Theoretical Biology and Medical Modelling.

[30]  T. Bedding,et al.  Bayesian Inference from Observations of Solar-like Oscillations , 2006, astro-ph/0608571.

[31]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[32]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[33]  M. Newton,et al.  Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity , 2006 .

[34]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[35]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[36]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[37]  Darren J. Wilkinson,et al.  Bayesian methods in bioinformatics and computational systems biology , 2006, Briefings Bioinform..

[38]  Xuming He,et al.  Non-parametric quantification of protein lysate arrays , 2007, Bioinform..

[39]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[40]  Raya Khanin,et al.  Bayesian model-based inference of transcription factor activity , 2007, BMC Bioinformatics.

[41]  Caitlin E. Buck,et al.  Sample selection in radiocarbon dating , 2002 .