Gene regulatory network reconstruction using dynamic bayesian networks

High-content technologies such as DNA microarrays can provide a system-scale overview of how genes interact with each other in a network context. Various mathematical methods and computational approaches have been proposed to reconstruct GRNs, including Boolean networks, information theory, differential equations and Bayesian networks. GRN reconstruction faces huge intrinsic challenges on both experimental and theoretical fronts, because the inputs and outputs of the molecular processes are unclear and the underlying principles are unknown or too complex. In this work, we focused on improving the accuracy and speed of GRN reconstruction with Dynamic Bayesian based method. A commonly used structure-learning algorithm is based on REVEAL (Reverse Engineering Algorithm). However, this method has some limitations when it is used for reconstructing GRNs. For instance, the two-stage temporal Bayes network (2TBN) cannot be well recovered by application of REVEAL; it has low accuracy and speed for high dimensionality networks that has above a hundred nodes; and it even cannot accomplish the task of reconstructing a network with 400 nodes. We implemented an algorithm for DBN structure learning with Friedman's score function to replace REVEAL, and tested it on reconstruction of both synthetic networks and real yeast networks and compared it with REVEAL in the absence or presence of preprocessed network generated by Zou and Conzen's algorithm. The new score metric improved the precision and recall of GRN reconstruction. Networks of gene interactions were reconstructed using a Dynamic Bayesian Network (DBN) approach and were analyzed to identify the mechanism of chemical-induced reversible neurotoxicity through reconstruction of gene regulatory networks in earthworms with tools curating relevant genes from non-model organism's pathway to model organism pathway.

[1]  Andrew W. Moore,et al.  Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[2]  M. Peitsch,et al.  Verification of systems biology research in the age of collaborative competition , 2011, Nature Biotechnology.

[3]  M. Reinders,et al.  Genetic network modeling. , 2002, Pharmacogenomics.

[4]  Chaoyang Zhang,et al.  A novel gene network inference algorithm using predictive minimum description length approach , 2010, BMC Systems Biology.

[5]  Andrew W. Moore,et al.  Finding optimal Bayesian networks by dynamic programming , 2005 .

[6]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[7]  Satoru Miyano,et al.  Finding Optimal Models for Small Gene Networks , 2003 .

[8]  Yongliang Yang,et al.  Target discovery from data mining approaches. , 2009, Drug discovery today.

[9]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[10]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  Diego di Bernardo,et al.  Inference of gene regulatory networks and compound mode of action from time course gene expression profiles , 2006, Bioinform..

[12]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[13]  Sangsoo Kim,et al.  Gene expression Differential coexpression analysis using microarray data and its application to human cancer , 2005 .

[14]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[15]  Satoru Miyano,et al.  Estimation of Genetic Networks and Functional Structures Between Genes by Using Bayesian Networks and Nonparametric Regression , 2001, Pacific Symposium on Biocomputing.

[16]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[17]  Desmond I. Bannon,et al.  RDX Binds to the GABAA Receptor–Convulsant Site and Blocks GABAA Receptor–Mediated Currents in the Amygdala: A Mechanism for RDX-Induced Seizures , 2010, Environmental health perspectives.

[18]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[19]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[20]  D. Cavalieri,et al.  Fundamentals of cDNA microarray data analysis. , 2003, Trends in genetics : TIG.

[21]  Kathleen F. Kerr,et al.  Standardizing global gene expression analysis between laboratories and across platforms , 2005, Nature Methods.

[22]  Jin Tian,et al.  A Branch-and-Bound Algorithm for MDL Learning Bayesian Networks , 2000, UAI.

[23]  Adriana Climescu-Haulica,et al.  A stochastic differential equation model for transcriptional regulatory networks , 2007, BMC Bioinformatics.

[24]  J. Collins,et al.  Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks , 2005, Nature Biotechnology.

[25]  Jurgen Del-Favero,et al.  Expression profiling of endocrine-disrupting compounds using a customized Cyprinus carpio cDNA microarray. , 2006, Toxicological sciences : an official journal of the Society of Toxicology.

[26]  J. J. Greene,et al.  Identification of interferon-modulated proliferation-related cDNA sequences. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Qiang Ji,et al.  Structure learning of Bayesian networks using constraints , 2009, ICML '09.

[28]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[29]  Edward J. Perkins,et al.  Neurochemical and electrophysiological diagnosis of reversible neurotoxicity in earthworms exposed to sublethal concentrations of CL-20 , 2009, Environmental science and pollution research international.

[30]  Jarmila Nahalkova,et al.  Comparative analysis of transcript abundance in Pinus sylvestris after challenge with a saprotrophic, pathogenic or mutualistic fungus. , 2008, Tree physiology.

[31]  Roger E Bumgarner,et al.  Sample size for detecting differentially expressed genes in microarray experiments , 2004, BMC Genomics.

[32]  Amer M. Diab,et al.  Hepatic transcriptomic profiles of European flounder (Platichthys flesus) from field sites and computational approaches to predict site from stress gene responses following exposure to model toxicants. , 2008, Aquatic toxicology.

[33]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[34]  Liang-Tsung Huang,et al.  An integrated method for cancer classification and rule extraction from microarray data , 2008, Journal of Biomedical Science.

[35]  Igor V. Tetko,et al.  Optimization models for cancer classification: extracting gene interaction information from microarray expression data , 2004, Bioinform..

[36]  Geert Molenberghs,et al.  Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods , 2003, Biometrics.

[37]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[38]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[39]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[40]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[42]  M. Robinson,et al.  Cloning of cDNAs encoding two related 100-kD coated vesicle proteins (alpha-adaptins) , 1989, The Journal of cell biology.

[43]  G. K. Ackers,et al.  Quantitative model for gene regulation by lambda phage repressor. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[45]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[46]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[47]  P. Maini,et al.  Spatial pattern formation in chemical and biological systems , 1997 .

[48]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[49]  Rong-Lin Wang,et al.  DNA Microarray‐based ecotoxicological biomarker discovery in a small fish model species , 2008, Environmental toxicology and chemistry.

[50]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[51]  Edward J. Perkins,et al.  Design, Validation and Annotation of Transcriptome-Wide Oligonucleotide Probes for the Oligochaete Annelid Eisenia fetida , 2010, PloS one.

[52]  Mehdi Pirooznia,et al.  Toxicogenomic analysis provides new insights into molecular mechanisms of the sublethal toxicity of 2,4,6-trinitrotoluene in Eisenia fetida. , 2007, Environmental science & technology.

[53]  Kuo-Chu Chang,et al.  Comparison of score metrics for Bayesian network learning , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[54]  M. Dehmer,et al.  Analysis of Microarray Data: A Network-Based Approach , 2008 .

[55]  S. Stürzenbaum,et al.  Comparative transcriptomic responses to chronic cadmium, fluoranthene, and atrazine exposure in Lumbricus rubellus. , 2008, Environmental science & technology.

[56]  Allan Tucker,et al.  A Bayesian network approach to explaining time series with changing structure , 2004, Intell. Data Anal..

[57]  S. P. Fodor,et al.  Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays , 1999, Nature Genetics.

[58]  Ilya Shmulevich,et al.  On Learning Gene Regulatory Networks Under the Boolean Network Model , 2003, Machine Learning.

[59]  Ash A. Alizadeh,et al.  Genome-wide analysis of DNA copy-number changes using cDNA microarrays , 1999, Nature Genetics.

[60]  Edward J Perkins,et al.  Gene expression profiling in Daphnia magna, part II: validation of a copper specific gene expression signature with effluent from two copper mines in California. , 2008, Environmental science & technology.

[61]  Linda C. van der Gaag,et al.  Probabilistic Graphical Models , 2014, Lecture Notes in Computer Science.

[62]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[63]  Michael Simini,et al.  3 Effects of Energetic Materials on Soil Organisms , 2009 .

[64]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[65]  Oded Maimon,et al.  Evaluation of gene-expression clustering via mutual information distance measure , 2007, BMC Bioinformatics.

[66]  Armin Shmilovici,et al.  Identification of transcription factor binding sites with variable-order Bayesian networks , 2005, Bioinform..

[67]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[68]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[69]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[70]  S. Kauffman The large scale structure and dynamics of gene control circuits: an ensemble approach. , 1974, Journal of theoretical biology.

[71]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[72]  Craig Boutilier,et al.  Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (2000) , 2013, ArXiv.

[73]  T. Ideker,et al.  Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae , 2006, Journal of biology.

[74]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[75]  Changhe Yuan,et al.  Memory-Efficient Dynamic Programming for Learning Optimal Bayesian Networks , 2011, AAAI.

[76]  Changhe Yuan,et al.  Learning Optimal Bayesian Networks Using A* Search , 2011, IJCAI.

[77]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[78]  Tommi S. Jaakkola,et al.  Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.

[79]  Douwe Molenaar,et al.  Gene expression analysis reveals a gene set discriminatory to different metals in soil. , 2010, Toxicological sciences : an official journal of the Society of Toxicology.

[80]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[81]  Changhe Yuan,et al.  Improving the Scalability of Optimal Bayesian Network Learning with External-Memory Frontier Breadth-First Branch and Bound Search , 2011, UAI.

[82]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[83]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[84]  Bart Deplancke,et al.  Gene Regulatory Networks , 2012, Methods in Molecular Biology.

[85]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[86]  Gary Moran,et al.  Comparative genomics using Candida albicans DNA microarrays reveals absence and divergence of virulence-associated genes in Candida dubliniensis. , 2004, Microbiology.

[87]  Mehdi Pirooznia,et al.  Transcriptomic analysis of RDX and TNT interactive sublethal effects in the earthworm Eisenia fetida , 2008, BMC Genomics.

[88]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[89]  Chaoyang Zhang,et al.  Comparison of probabilistic Boolean network and dynamic Bayesian network approaches for inferring gene regulatory networks , 2007, BMC Bioinformatics.

[90]  Gerald T Ankley,et al.  Toxicogenomics in regulatory ecotoxicology. , 2006, Environmental science & technology.

[91]  H. Meinhardt,et al.  A theory of biological pattern formation , 1972, Kybernetik.