Reconstructing Causal Biological Networks through Active Learning

Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[4]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[5]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[6]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[7]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[8]  Rainer Spang,et al.  Evaluating the effect of perturbations in reconstructing network topologies , 2003 .

[9]  Leland Gerson Neuberg,et al.  CAUSALITY: MODELS, REASONING, AND INFERENCE, by Judea Pearl, Cambridge University Press, 2000 , 2003, Econometric Theory.

[10]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[11]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[12]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[13]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[14]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[15]  Frederick Eberhardt,et al.  Almost Optimal Intervention Sets for Causal Discovery , 2008, UAI.

[16]  Yangbo He,et al.  Active Learning of Causal Networks with Intervention Experiments and Optimal Designs , 2008 .

[17]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[18]  Marco Grzegorczyk,et al.  An introduction to Gaussian Bayesian networks. , 2010, Methods in molecular biology.

[19]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[20]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[21]  Joris M. Mooij,et al.  Cyclic Causal Discovery from Continuous Equilibrium Data , 2013, UAI.

[22]  Grégory Nuel,et al.  Joint estimation of causal effects from observational and intervention gene expression data , 2013, BMC Systems Biology.

[23]  Alain Hauser,et al.  Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs , 2013, 1303.3216.

[24]  Peter Bühlmann,et al.  Two optimal strategies for active learning of causal models from interventional data , 2012, Int. J. Approx. Reason..

[25]  E. Berg Systems biology in drug discovery and development. , 2014, Drug discovery today.