Data-driven reverse engineering of signaling pathways using ensembles of dynamic models

Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM’s ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.

[1]  Edmund M. Clarke,et al.  Analysis and verification of the HMGB1 signaling pathway , 2010, BMC Bioinformatics.

[2]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[3]  Satoru Miyano,et al.  Computational gene network analysis reveals TNF-induced angiogenesis , 2012, BMC Systems Biology.

[4]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[5]  Tanya Cashorali,et al.  Causal Modeling Using Network Ensemble Simulations of Genetic and Gene Expression Data Predicts Genes Involved in Rheumatoid Arthritis , 2011, PLoS Comput. Biol..

[6]  Daniel Marbach,et al.  Information-Theoretic Inference of Gene Networks Using Backward Elimination , 2010, BIOCOMP.

[7]  Eva Balsa-Canto,et al.  Parameter estimation and optimal experimental design. , 2008, Essays in biochemistry.

[8]  S. Kauffman A proposal for using the ensemble approach to understand genetic regulatory networks. , 2004, Journal of theoretical biology.

[9]  Reto Knutti,et al.  The use of the multi-model ensemble in probabilistic climate projections , 2007, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[10]  Edmund J. Crampin,et al.  NAIL, a software toolset for inferring, analyzing and visualizing regulatory networks , 2015, Bioinform..

[11]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[12]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[13]  Julio Saez-Rodriguez,et al.  Crowdsourcing Network Inference: The DREAM Predictive Signaling Network Challenge , 2011, Science Signaling.

[14]  Benjamin E Dunmore,et al.  Gene network inference and visualization tools for biologists: application to new human transcriptome datasets , 2011, Nucleic acids research.

[15]  Michael A. Langston,et al.  Reconstructing Generalized Logical Networks of Transcriptional Regulation in Mouse Brain from Temporal Gene Expression Data , 2009, EURASIP J. Bioinform. Syst. Biol..

[16]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[17]  Andrea Califano,et al.  hARACNe: improving the accuracy of regulatory model reverse engineering via higher-order data processing inequality tests , 2013, Interface Focus.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  David Henriques,et al.  Modeling signaling networks with different formalisms: a preview. , 2013, Methods in molecular biology.

[20]  Steffen Klamt,et al.  Transforming Boolean models to continuous models: methodology and application to T-cell receptor signaling , 2009, BMC Systems Biology.

[21]  Jörg Stelling,et al.  Systems analysis of cellular networks under uncertainty , 2009, FEBS letters.

[22]  Chi-Ying F. Huang,et al.  Ultrasensitivity in the mitogen-activated protein kinase cascade. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Adrian E. Raftery,et al.  Weather Forecasting with Ensemble Methods , 2005, Science.

[24]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[25]  Roland Eils,et al.  Optimal Experimental Design for Parameter Estimation of a Cell Signaling Model , 2009, PLoS Comput. Biol..

[26]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[27]  Julio Saez-Rodriguez,et al.  Modeling Signaling Networks to Advance New Cancer Therapies. , 2015, Annual review of biomedical engineering.

[28]  Linda M. Wills,et al.  Reverse Engineering , 1996, Springer US.

[29]  Philip S. Yu,et al.  G-Bean: an ontology-graph based web tool for biomedical literature retrieval , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[30]  J Schaber,et al.  Nested uncertainties in biochemical models. , 2009, IET systems biology.

[31]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[32]  Julio Saez-Rodriguez,et al.  CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms , 2012, BMC Systems Biology.

[33]  Evan O. Paull,et al.  Inferring causal molecular networks: empirical assessment through a community-based effort , 2016, Nature Methods.

[34]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[35]  Peter J. Woolf,et al.  Learning transcriptional regulatory networks from high throughput gene expression data using continuous three-way mutual information , 2008, BMC Bioinformatics.

[36]  Julio Saez-Rodriguez,et al.  Flexible informatics for linking experimental data to mathematical models via DataRail , 2008, Bioinform..

[37]  D. Lauffenburger,et al.  Input–output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data , 2009, Molecular systems biology.

[38]  Claire J. Tomlin,et al.  Exact Reconstruction of Gene Regulatory Networks using Compressive Sensing , 2014 .

[39]  Fabian J. Theis,et al.  Odefy -- From discrete to continuous models , 2010, BMC Bioinformatics.

[40]  Rudiyanto Gunawan,et al.  Ensemble Kinetic Modeling of Metabolic Networks from Dynamic Metabolic Profiles , 2012, Metabolites.

[41]  J. Stelling,et al.  Ensemble modeling for analysis of cell signaling dynamics , 2007, Nature Biotechnology.

[42]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[43]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[44]  A. Wagner,et al.  Automatic Generation of Predictive Dynamic Models Reveals Nuclear Phosphorylation as the Key Msn2 Control Mechanism , 2013, Science Signaling.

[45]  Michael Banf,et al.  Computational inference of gene regulatory networks: Approaches, limitations and opportunities. , 2017, Biochimica et biophysica acta. Gene regulatory mechanisms.

[46]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[47]  J. Ross,et al.  MIDER: Network Inference with Mutual Information Distance and Entropy Reduction , 2014, PloS one.

[48]  L. Breiman Arcing Classifiers , 1998 .

[49]  G. Johnson,et al.  Mitogen-Activated Protein Kinase Pathways Mediated by ERK, JNK, and p38 Protein Kinases , 2002, Science.

[50]  Renate Hagedorn,et al.  The rationale behind the success of multi-model ensembles in seasonal forecasting — I. Basic concept , 2005 .

[51]  Carol S. Woodward,et al.  Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation Solvers , 2020, ACM Trans. Math. Softw..

[52]  Kevin Truong,et al.  Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach , 2002, BMC Bioinformatics.

[53]  David Henriques,et al.  MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics , 2013, BMC Bioinformatics.

[54]  Julio R. Banga,et al.  Reverse engineering and identification in systems biology: strategies, perspectives and challenges , 2014, Journal of The Royal Society Interface.

[55]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[56]  Julio R. Banga,et al.  Reverse Engineering Cellular Networks with Information Theoretic Methods , 2013, Cells.

[57]  Claudio Altafini,et al.  Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data , 2007, Bioinform..

[58]  Fangfang Xia,et al.  Genome-scale bacterial transcriptional regulatory networks: reconstruction and integrated analysis with metabolic models , 2014, Briefings Bioinform..

[59]  David Henriques,et al.  Reverse engineering of logic-based differential equation models using a mixed-integer dynamic optimization approach , 2015, Bioinform..

[60]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[61]  Marina Meila,et al.  IB, NF-B Regulation Model: Simulation Analysis of Small Number of Molecules , 2008, EURASIP J. Bioinform. Syst. Biol..

[62]  Mark A. Ragan,et al.  Supervised, semi-supervised and unsupervised inference of gene regulatory networks , 2013, Briefings Bioinform..

[63]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[64]  J. Liao,et al.  Metabolic ensemble modeling for strain engineers , 2012, Biotechnology journal.

[65]  V. Hatzimanikatis,et al.  Modeling of uncertainties in biochemical reactions , 2011, Biotechnology and bioengineering.

[66]  Rudiyanto Gunawan,et al.  Ensemble Inference and Inferability of Gene Regulatory Networks , 2014, PloS one.

[67]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[68]  B. Kholodenko,et al.  Computational Approaches for Analyzing Information Flow in Biological Networks , 2012, Science Signaling.

[69]  Kathryn Chaloner,et al.  Identification, Characterization and Immunogenicity of an O-Antigen Capsular Polysaccharide of Francisella tularensis , 2010, PloS one.

[70]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[71]  Guido Sanguinetti,et al.  Combining tree-based and dynamical systems for the inference of gene regulatory networks , 2015 .

[72]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[73]  Michele Ceccarelli,et al.  TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach , 2010, BMC Bioinformatics.

[74]  Julio R. Banga,et al.  An evolutionary method for complex-process optimization , 2010, Comput. Oper. Res..

[75]  James C Liao,et al.  Ensemble Modeling for Robustness Analysis in engineering non-native metabolic pathways. , 2014, Metabolic engineering.

[76]  Ravinder Singh,et al.  Fast-Find: A novel computational approach to analyzing combinatorial motifs , 2006, BMC Bioinformatics.

[77]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[78]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[79]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[80]  Julio Saez-Rodriguez,et al.  OmniPath: guidelines and gateway for literature-curated signaling pathway resources , 2016, Nature Methods.