Optimal design of gene knockout experiments for gene regulatory network inference

Motivation: We addressed the problem of inferring gene regulatory network (GRN) from gene expression data of knockout (KO) experiments. This inference is known to be underdetermined and the GRN is not identifiable from data. Past studies have shown that suboptimal design of experiments (DOE) contributes significantly to the identifiability issue of biological networks, including GRNs. However, optimizing DOE has received much less attention than developing methods for GRN inference. Results: We developed REDuction of UnCertain Edges (REDUCE) algorithm for finding the optimal gene KO experiment for inferring directed graphs (digraphs) of GRNs. REDUCE employed ensemble inference to define uncertain gene interactions that could not be verified by prior data. The optimal experiment corresponds to the maximum number of uncertain interactions that could be verified by the resulting data. For this purpose, we introduced the concept of edge separatoid which gave a list of nodes (genes) that upon their removal would allow the verification of a particular gene interaction. Finally, we proposed a procedure that iterates over performing KO experiments, ensemble update and optimal DOE. The case studies including the inference of Escherichia coli GRN and DREAM 4 100-gene GRNs, demonstrated the efficacy of the iterative GRN inference. In comparison to systematic KOs, REDUCE could provide much higher information return per gene KO experiment and consequently more accurate GRN estimates. Conclusions: REDUCE represents an enabling tool for tackling the underdetermined GRN inference. Along with advances in gene deletion and automation technology, the iterative procedure brings an efficient and fully automated GRN inference closer to reality. Availability and implementation: MATLAB and Python scripts of REDUCE are available on www.cabsel.ethz.ch/tools/REDUCE. Contact: rudi.gunawan@chem.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Alfred V. Aho,et al.  The Transitive Reduction of a Directed Graph , 1972, SIAM J. Comput..

[2]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[3]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[4]  Andreas Zell,et al.  Iteratively Inferring Gene Regulatory Networks with Virtual Knockout Experiments , 2004, EvoWorkshops.

[5]  Sanjeev Khanna,et al.  Approximating Longest Directed Paths and Cycles , 2004, ICALP.

[6]  Rudiyanto Gunawan,et al.  Ensemble Inference and Inferability of Gene Regulatory Networks , 2014, PloS one.

[7]  Adilson E Motter,et al.  Sub-optimal phenotypes of double-knockout mutants of Escherichia coli depend on the order of gene deletions. , 2015, Integrative biology : quantitative biosciences from nano to macro.

[8]  Dawei Hong,et al.  A theoretical approach to gene network identification , 2012, 2012 IEEE Information Theory Workshop.

[9]  J. Roach,et al.  Statistical analysis of MPSS measurements: application to the study of LPS-activated macrophage gene expression. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  J. Kato,et al.  Construction of consecutive deletions of the Escherichia coli chromosome , 2007, Molecular systems biology.

[11]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[12]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[13]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[14]  Florian Steinke,et al.  Experimental design for efficient identification of gene regulatory networks using sparse Bayesian models , 2006, BMC Systems Biology.

[15]  J. Hasty,et al.  Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Julio R. Banga,et al.  Inference of complex biological networks: distinguishability issues and optimization-based solutions , 2011, BMC Systems Biology.

[17]  V. Thorsson,et al.  Discovery of regulatory interactions through perturbation: inference and experimental design. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  Moritz Lang,et al.  Cutting the wires: modularization of cellular networks for experimental design. , 2014, Biophysical journal.

[19]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[20]  Julio Saez-Rodriguez,et al.  Crowdsourcing Network Inference: The DREAM Predictive Signaling Network Challenge , 2011, Science Signaling.

[21]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[22]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .