Efficient Bayesian estimates for discrimination among topologically different systems biology models.

A major effort in systems biology is the development of mathematical models that describe complex biological systems at multiple scales and levels of abstraction. Determining the topology-the set of interactions-of a biological system from observations of the system's behavior is an important and difficult problem. Here we present and demonstrate new methodology for efficiently computing the probability distribution over a set of topologies based on consistency with existing measurements. Key features of the new approach include derivation in a Bayesian framework, incorporation of prior probability distributions of topologies and parameters, and use of an analytically integrable linearization based on the Fisher information matrix that is responsible for large gains in efficiency. The new method was demonstrated on a collection of four biological topologies representing a kinase and phosphatase that operate in opposition to each other with either processive or distributive kinetics, giving 8-12 parameters for each topology. The linearization produced an approximate result very rapidly (CPU minutes) that was highly accurate on its own, as compared to a Monte Carlo method guaranteed to converge to the correct answer but at greater cost (CPU weeks). The Monte Carlo method developed and applied here used the linearization method as a starting point and importance sampling to approach the Bayesian answer in acceptable time. Other inexpensive methods to estimate probabilities produced poor approximations for this system, with likelihood estimation showing its well-known bias toward topologies with more parameters and the Akaike and Schwarz Information Criteria showing a strong bias toward topologies with fewer parameters. These results suggest that this linear approximation may be an effective compromise, providing an answer whose accuracy is near the true Bayesian answer, but at a cost near the common heuristics.

[1]  D. Weakliem A Critique of the Bayesian Information Criterion for Model Selection , 1999 .

[2]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[3]  J. Sethna,et al.  Comment on "Sloppy models, parameter uncertainty, and the role of experimental design". , 2011, Molecular bioSystems.

[4]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[5]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[6]  J. Timmer,et al.  Systems biology: experimental design , 2009, The FEBS journal.

[7]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[10]  Bruce Tidor,et al.  Sloppy models, parameter uncertainty, and the role of experimental design. , 2010, Molecular bioSystems.

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[13]  Bernhard O. Palsson,et al.  Matrix Formalism to Describe Functional States of Transcriptional Regulatory Systems , 2006, PLoS Comput. Biol..

[14]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[15]  James E. Ferrell,et al.  Mechanistic Studies of the Dual Phosphorylation of Mitogen-activated Protein Kinase* , 1997, The Journal of Biological Chemistry.

[16]  Phillip C. Wright,et al.  An insight into iTRAQ: where do we stand now? , 2012, Analytical and Bioanalytical Chemistry.

[17]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[18]  Douglas A. Lauffenburger,et al.  Learning Signaling Network Structures with Sparsely Distributed Data , 2009, J. Comput. Biol..

[19]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[20]  A. Raftery Choosing Models for Cross-Classifications , 1986 .

[21]  Antonis Papachristodoulou,et al.  Efficient, sparse biological network determination , 2009, BMC Systems Biology.

[22]  R. Aebersold,et al.  Mass spectrometry-based proteomics and network biology. , 2012, Annual review of biochemistry.

[23]  Peter E. Rossi,et al.  Bayes factors for nonlinear hypotheses and likelihood distributions , 1992 .

[24]  George J. Pappas,et al.  Genetic network identification using convex programming. , 2009, IET systems biology.

[25]  J. Hasty,et al.  Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  F. Bruggeman,et al.  Introduction to systems biology. , 2007, EXS.

[27]  Mark A. Girolami,et al.  Bayesian ranking of biochemical system models , 2008, Bioinform..

[28]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[29]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[30]  Jason A. Papin,et al.  Reconstruction of cellular signalling networks and analysis of their properties , 2005, Nature Reviews Molecular Cell Biology.

[31]  Jacob K. White,et al.  Convergence in parameters and predictions using computational experimental design , 2013, Interface Focus.

[32]  J. Kuha AIC and BIC , 2004 .

[33]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[35]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Christopher R. Myers,et al.  Universally Sloppy Parameter Sensitivities in Systems Biology Models , 2007, PLoS Comput. Biol..

[37]  J. Stelling Mathematical models in microbial systems biology. , 2004, Current opinion in microbiology.

[38]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[39]  Mark A. Girolami,et al.  Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[40]  Bernhard O Palsson,et al.  Systemic metabolic reactions are obtained by singular value decomposition of genome-scale stoichiometric matrices. , 2003, Journal of theoretical biology.

[41]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[42]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[43]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[44]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[45]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[46]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[47]  Bruce Tidor,et al.  Reply to Comment on "Sloppy models, parameter uncertainty, and the role of experimental design" , 2011, Molecular bioSystems.

[48]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.