A Bayesian Active Learning Experimental Design for Inferring Signaling Networks

Machine learning methods for learning network structure are applied to quantitative proteomics experiments and reverse-engineer intracellular signal transduction networks. They provide insight into the rewiring of signaling within the context of a disease or a phenotype. To learn the causal patterns of influence between proteins in the network, the methods require experiments that include targeted interventions that fix the activity of specific proteins. However, the interventions are costly and add experimental complexity. We describe an active learning strategy for selecting optimal interventions. Our approach takes as inputs pathway databases and historic data sets, expresses them in form of prior probability distributions on network structures, and selects interventions that maximize their expected contribution to structure learning. Evaluations on simulated and real data show that the strategy reduces the detection error of validated edges as compared with an unguided choice of interventions and avoids redundant interventions, thereby increasing the effectiveness of the experiment.

[1]  Julio Saez-Rodriguez,et al.  CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms , 2012, BMC Systems Biology.

[2]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[3]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[4]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[5]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[6]  Julio Saez-Rodriguez,et al.  Crowdsourcing Network Inference: The DREAM Predictive Signaling Network Challenge , 2011, Science Signaling.

[7]  Fabio Gagliardi Cozman,et al.  Random Generation of Bayesian Networks , 2002, SBIA.

[8]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence, Second Edition , 2010 .

[9]  Gregory F. Cooper,et al.  Causal Discovery from a Mixture of Experimental and Observational Data , 1999, UAI.

[10]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[11]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[12]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[13]  Julio Saez-Rodriguez,et al.  Modeling signaling networks using high-throughput phospho-proteomics. , 2012, Advances in experimental medicine and biology.

[14]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[15]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[16]  Tiffany J. Chen,et al.  Cytobank: providing an analytics platform for community cytometry data analysis and collaboration. , 2014, Current topics in microbiology and immunology.

[17]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[18]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[19]  Karen Sachs,et al.  Characterization of patient specific signaling via augmentation of bayesian networks with disease and patient state nodes , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[21]  Kevin P. Murphy,et al.  Exact Bayesian structure learning from uncertain interventions , 2007, AISTATS.

[22]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[23]  Yangbo He,et al.  Active Learning of Causal Networks with Intervention Experiments and Optimal Designs , 2008 .

[24]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[25]  T. Pawson,et al.  Oncogenic re-wiring of cellular signaling pathways , 2007, Oncogene.

[26]  Bernard Manderick,et al.  Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach , 2006, MDAI.

[27]  Olga Vitek,et al.  From Correlation to Causality: Statistical Approaches to Learning Regulatory Relationships in Large-Scale Biomolecular Investigations. , 2016, Journal of proteome research.

[28]  Lorenz Wernisch,et al.  Reconstruction of gene networks using Bayesian learning and manipulation experiments , 2004, Bioinform..

[29]  Peter Müller,et al.  Sequential stopping for high-throughput experiments , 2013, Biostatistics.

[30]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[31]  Frederick Eberhardt,et al.  On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables , 2005, UAI.

[32]  Bonnie Berger,et al.  Reconstructing Causal Biological Networks through Active Learning , 2016, PloS one.

[33]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[34]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[35]  Yuanfang Guan,et al.  Systematic Planning of Genome-Scale Experiments in Poorly Studied Species , 2010, PLoS Comput. Biol..

[36]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[37]  Garry P. Nolan,et al.  Simultaneous measurement of multiple active kinase states using polychromatic flow cytometry , 2002, Nature Biotechnology.

[38]  Sun Yong Kim,et al.  Bootstrap Analysis of Gene Networks Based on Bayesian Networks and Nonparametric Regression , 2002 .

[39]  M. Scutari On the Prior and Posterior Distributions Used in Graphical Modelling , 2012, 1201.4058.

[40]  Christopher H. Bryant,et al.  Functional genomic hypothesis generation and experimentation by a robot scientist , 2004, Nature.

[41]  D. Margaritis Learning Bayesian Network Model Structure from Data , 2003 .

[42]  O. Ornatsky,et al.  Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. , 2009, Analytical chemistry.