Defining informative priors for ensemble modeling in systems biology

Ensemble modeling in molecular systems biology requires the reproducible translation of kinetic parameter data into informative probability distributions (priors), as well as approaches that sample parameters from these distributions without violating the thermodynamic consistency of the overall model. Although a number of pioneering frameworks for ensemble modeling have been published, the issue of generating informative priors has not yet been addressed. Here, we present a protocol that aims to fill this gap. This protocol discusses the collection of parameter values from a diverse range of sources (literature, databases and experiments), assessment of their plausibility, and creation of log-normal probability distributions that can be used as informative priors in ensemble modeling. Furthermore, the protocol enables sampling from the generated distributions while maintaining thermodynamic consistency. Once all parameter values have been retrieved from literature and databases, the protocol can be implemented within ~5–10 min per parameter. The aim of this protocol is to facilitate the design and use of informative distributions for ensemble modeling, especially in fields such as synthetic biology and systems medicine.This protocol addresses the need to define informative priors to apply ensemble modeling in systems biology. The protocol collects parameters, assesses their plausibility and creates log-normal probability distributions for use as informative priors.

[1]  Jens Timmer,et al.  Summary of the DREAM8 Parameter Estimation Challenge: Toward Parameter Identification for Whole-Cell Models , 2015, PLoS Comput. Biol..

[2]  J. Liao,et al.  Reducing the allowable kinetic space by constructing ensemble of dynamic models with the same steady-state flux. , 2011, Metabolic engineering.

[3]  E. Klipp,et al.  Biochemical networks with uncertain parameters. , 2005, Systems biology.

[4]  K. Walters Parameter estimation for an immortal model of colonic stem cell division using approximate Bayesian computation. , 2012, Journal of theoretical biology.

[5]  Griffin M. Weber,et al.  BioNumbers—the database of key numbers in molecular and cell biology , 2009, Nucleic Acids Res..

[6]  James C Liao,et al.  Ensemble Modeling for Robustness Analysis in engineering non-native metabolic pathways. , 2014, Metabolic engineering.

[7]  Antje Chang,et al.  BRENDA in 2017: new perspectives and new tools in BRENDA , 2016, Nucleic Acids Res..

[8]  Tom Heskes,et al.  BCM: toolkit for Bayesian analysis of Computational Models using samplers , 2016, BMC Systems Biology.

[9]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[10]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[11]  R. Breitling,et al.  Explicit consideration of topological and parameter uncertainty gives new insights into a well‐established model of glycolysis , 2013, The FEBS journal.

[12]  Peter K. Sorger,et al.  Exploring the Contextual Sensitivity of Factors that Determine Cell-to-Cell Variability in Receptor-Mediated Apoptosis , 2012, PLoS Comput. Biol..

[13]  Jeremy L. Muhlich,et al.  Properties of cell death models calibrated and compared using Bayesian approaches , 2013, Molecular systems biology.

[14]  John K. Goutsias,et al.  Thermodynamically consistent model calibration in chemical kinetics , 2011, BMC Systems Biology.

[15]  P. Kuzmič,et al.  Program DYNAFIT for the analysis of enzyme kinetic data: application to HIV proteinase. , 1996, Analytical biochemistry.

[16]  Allan Gut,et al.  An intermediate course in probability , 1995 .

[17]  J. Stelling,et al.  Ensemble modeling for analysis of cell signaling dynamics , 2007, Nature Biotechnology.

[18]  Ljubisa Miskovic,et al.  iSCHRUNK--In Silico Approach to Characterization and Reduction of Uncertainty in the Kinetic Models of Genome-scale Metabolic Networks. , 2016, Metabolic engineering.

[19]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[20]  Stefan Schuster,et al.  Detecting and investigating substrate cycles in a genome‐scale human metabolic network , 2012, The FEBS journal.

[21]  N. A. Marlow,et al.  A normal limit theorem for power sums of independent random variables , 1967 .

[22]  Daniel Zwillinger,et al.  CRC Standard Probability and Statistics Tables and Formulae, Student Edition , 1999 .

[23]  Michael P H Stumpf,et al.  How to deal with parameters for whole-cell modelling , 2017, Journal of The Royal Society Interface.

[24]  Lars K. Nielsen,et al.  A General Framework for Thermodynamically Consistent Parameterization and Efficient Sampling of Enzymatic Reactions , 2015, PLoS Comput. Biol..

[25]  Rainer Breitling,et al.  Dynamic Modelling under Uncertainty: The Case of Trypanosoma brucei Energy Metabolism , 2012, PLoS Comput. Biol..

[26]  Rainer Breitling,et al.  Respectful modelling: Addressing uncertainty in dynamic system models for molecular biology , 2016 .

[27]  J. Ross,et al.  Thermodynamically based constraints for rate coefficients of large biochemical networks , 2009, Wiley interdisciplinary reviews. Systems biology and medicine.

[28]  R. Milo,et al.  Cell Biology by the Numbers , 2015 .

[29]  Edda Klipp,et al.  Prediction of enzyme kinetic parameters based on statistical learning. , 2006, Genome informatics. International Conference on Genome Informatics.

[30]  Edda Klipp,et al.  Modular rate laws for enzymatic reactions: thermodynamics, elasticities and implementation , 2010, Bioinform..

[31]  Christopher J. Roy,et al.  A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing , 2011 .

[32]  Sarah Filippi,et al.  A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation , 2014, Nature Protocols.

[33]  Eric Winsberg,et al.  Value judgements and the estimation of uncertainty in climate modeling , 2010 .

[34]  Joyce P. Jacobsen,et al.  Comparing Standard Regression Modeling to Ensemble Modeling: How Data Mining Software Can Improve Economists’ Predictions , 2016 .

[35]  Keng C. Soh,et al.  Towards kinetic modeling of genome-scale metabolic networks without sacrificing stoichiometric, thermodynamic and physiological constraints. , 2013, Biotechnology journal.

[36]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[37]  Axel Kowald,et al.  Systems Biology - a Textbook , 2016 .

[38]  Keng C. Soh,et al.  A design–build–test cycle using modeling and experiments reveals interdependencies between upper glycolysis and xylose uptake in recombinant S. cerevisiae and improves predictive capabilities of large-scale kinetic models , 2017, Biotechnology for Biofuels.

[39]  Zhou Yu,et al.  Investigation of Transcription Repression and Small-Molecule Responsiveness by TetR-Like Transcription Factors Using a Heterologous Escherichia coli-Based Assay , 2007, Journal of bacteriology.

[40]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[41]  R. Mahadevan,et al.  Ensemble Modeling of Cancer Metabolism , 2012, Front. Physio..

[42]  Lei Dai,et al.  Structural and functional analysis of the transcriptional regulator Rv3066 of Mycobacterium tuberculosis , 2012, Nucleic acids research.

[43]  Liesbet Geris,et al.  Uncertainty in biology: a computational modeling approach , 2015 .

[44]  V. Leskovac Comprehensive Enzyme Kinetics , 2003 .

[45]  Soha Hassoun,et al.  Discovery of substrate cycles in large scale metabolic networks using hierarchical modularity , 2015, BMC Systems Biology.

[46]  W. Hillen,et al.  Dynamics of repressor-operator recognition: the Tn10-encoded tetracycline resistance control. , 1988, Biochemistry.

[47]  R. Milo,et al.  Protein Dynamics in Individual Human Cells: Experiment and Theory , 2009, PloS one.

[48]  S. Nadarajah,et al.  Gendist: An R Package for Generated Probability Distribution Models , 2016, PloS one.

[49]  L. Fenton The Sum of Log-Normal Probability Distributions in Scatter Transmission Systems , 1960 .

[50]  Feng Qi,et al.  Generating rate equations for complex enzyme systems by a computer-assisted systematic method , 2009, BMC Bioinformatics.

[51]  H. Qian,et al.  Metabolic futile cycles and their functions: a systems analysis of energy and control. , 2005, Systems biology.

[52]  Saurabh Sinha,et al.  A Systematic Ensemble Approach to Thermodynamic Modeling of Gene Expression from Sequence Data. , 2015, Cell systems.

[53]  W. Hillen,et al.  Two mutations in the tetracycline repressor change the inducer anhydrotetracycline to a corepressor. , 2004, Nucleic acids research.

[54]  H. Sauro Enzyme Kinetics for Systems Biology , 2012 .

[55]  Gary R. Mirams,et al.  Hierarchical Bayesian inference for ion channel screening dose-response data , 2016, Wellcome open research.

[56]  Y. Huang,et al.  The TetR-Type Transcriptional Repressor RolR from Corynebacterium glutamicum Regulates Resorcinol Catabolism by Binding to a Unique Operator, rolO , 2012, Applied and Environmental Microbiology.

[57]  Moritz Lang,et al.  Modular Parameter Identification of Biomolecular Networks , 2016, SIAM J. Sci. Comput..

[58]  E. L. King,et al.  A Schematic Method of Deriving the Rate Laws for Enzyme-Catalyzed Reactions , 1956 .