An effective structure learning method for constructing gene networks

MOTIVATION Bayesian network methods have shown promise in gene regulatory network reconstruction because of their capability of capturing causal relationships between genes and handling data with noises found in biological experiments. The problem of learning network structures, however, is NP hard. Consequently, heuristic methods such as hill climbing are used for structure learning. For networks of a moderate size, hill climbing methods are not computationally efficient. Furthermore, relatively low accuracy of the learned structures may be observed. The purpose of this article is to present a novel structure learning method for gene network discovery. RESULTS In this paper, we present a novel structure learning method to reconstruct the underlying gene networks from the observational gene expression data. Unlike hill climbing approaches, the proposed method first constructs an undirected network based on mutual information between two nodes and then splits the structure into substructures. The directional orientations for the edges that connect two nodes are then obtained by optimizing a scoring function for each substructure. Our method is evaluated using two benchmark network datasets with known structures. The results show that the proposed method can identify networks that are close to the optimal structures. It outperforms hill climbing methods in terms of both computation time and predicted structure accuracy. We also apply the method to gene expression data measured during the yeast cycle and show the effectiveness of the proposed method for network reconstruction.

[1]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[2]  D 'haeseleer P,et al.  Gene Network Inference Using a Linear, Additive Regulation Model , 1999 .

[3]  Kristian G. Olesen,et al.  Maximal Prime Subgraph Decomposition of Bayesian Networks , 2001, FLAIRS.

[4]  S. P. Fodor,et al.  Using oligonucleotide probe arrays to access genetic diversity. , 1995, BioTechniques.

[5]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[6]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[7]  Liviu Badea,et al.  Determining the Direction of Causal Influence in Large Probabilistic Networks: A Constraint-Based Approach , 2004, ECAI.

[8]  P. Pochart,et al.  Conserved Properties between Functionally Distinct MutS Homologs in Yeast* , 1997, The Journal of Biological Chemistry.

[9]  H. D. Jong,et al.  Qualitative simulation of genetic regulatory networks using piecewise-linear models , 2004, Bulletin of mathematical biology.

[10]  A. Amon,et al.  Regulation of B‐type cyclin proteolysis by Cdc28–associated kinases in budding yeast , 1997, The EMBO journal.

[11]  A. Brazma,et al.  Towards reconstruction of gene networks from expression data by supervised learning , 2003, Genome Biology.

[12]  Satoru Miyano,et al.  Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations , 2002, Pacific Symposium on Biocomputing.

[13]  J. Pearl,et al.  Learning simple causal structures , 1993 .

[14]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[15]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[16]  Cynthia J. Brame,et al.  Nuclear Import of Histone H2a and H2b Is Mediated by a Network of Karyopherins , 2001, The Journal of cell biology.

[17]  Sergei Egorov,et al.  Pathway studio - the analysis and navigation of molecular networks , 2003, Bioinform..

[18]  Satoru Miyano,et al.  Finding Optimal Models for Small Gene Networks , 2003 .

[19]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[20]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[21]  Satoru Miyano,et al.  Using Protein-Protein Interactions for Refining Gene Networks Estimated from Microarray Data by Bayesian Networks , 2003, Pacific Symposium on Biocomputing.

[22]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[23]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[24]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[25]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[26]  V. Anne Smith,et al.  Influence of Network Topology and Data Collection on Network Inference , 2003, Pacific Symposium on Biocomputing.

[27]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[28]  Cheng-Yan Kao,et al.  A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae , 2005, Bioinform..

[29]  G. Di Stefano,et al.  Imbalance in dosage of the genes for the heterochromatin components Sir3p and histone H4 results in changes in the length and sequence organization of yeast telomeres , 1999, Molecular and General Genetics MGG.

[30]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[31]  Tommi S. Jaakkola,et al.  Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks , 2000, Pacific Symposium on Biocomputing.

[32]  John G. Proakis,et al.  Digital Communications , 1983 .

[33]  Aurélien Mazurie,et al.  Gene networks inference using dynamic Bayesian networks , 2003, ECCB.

[34]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[35]  Robert J. Schumacher,et al.  Maturation of Human Cyclin E Requires the Function of Eukaryotic Chaperonin CCT , 1998, Molecular and Cellular Biology.

[36]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[37]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[38]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[39]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[40]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[41]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[42]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[43]  V. Anne Smith,et al.  Evaluating functional network inference using simulations of complex biological systems , 2002, ISMB.

[44]  Xiaobo Zhou,et al.  A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks , 2004, Bioinform..

[45]  D. Engelberg,et al.  The Yeast Ras/Cyclic AMP Pathway Induces Invasive Growth by Suppressing the Cellular Stress Response , 1999, Molecular and Cellular Biology.

[46]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[47]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[48]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[49]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[50]  Xutao Deng,et al.  EXAMINE: a computational approach to reconstructing gene regulatory networks. , 2005, Bio Systems.

[51]  E. Alani,et al.  The Saccharomyces cerevisiae Msh2 and Msh6 proteins form a complex that specifically binds to duplex oligonucleotides containing mismatched DNA base pairs , 1996, Molecular and cellular biology.

[52]  Diego di Bernardo,et al.  Robust Identification of Large Genetic Networks , 2003, Pacific Symposium on Biocomputing.

[53]  Marcel J. T. Reinders,et al.  Linear Modeling of Genetic Networks from Experimental Data , 2000, ISMB.

[54]  Jesper Tegnér,et al.  Growing Bayesian network models of gene networks from seed genes , 2005, ECCB/JBI.

[55]  Luis M. de Campos,et al.  A comparison of learning algorithms for Bayesian networks: a case study based on data from an emergency medical service , 2004, Artif. Intell. Medicine.

[56]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[57]  Luis M. de Campos,et al.  A new approach for learning belief networks using independence criteria , 2000, Int. J. Approx. Reason..

[58]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[59]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[60]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[61]  Sui Huang Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery , 1999, Journal of Molecular Medicine.

[62]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[63]  Lorenz Wernisch,et al.  Reconstruction of gene networks using Bayesian learning and manipulation experiments , 2004, Bioinform..

[64]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.