Integrate qualitative biological knowledge for gene regulatory network reconstruction with dynamic bayesian networks

Reconstructing gene regulatory networks, especially the dynamic gene networks that reveal the temporal program of gene expression from microarray expression data, is essential in systems biology. To overcome the challenges posed by the noisy and under-sampled microarray data, developing data fusion methods to integrate legacy biological knowledge for gene network reconstruction is a promising direction. However, large amount of qualitative biological knowledge accumulated by previous research, albeit very valuable, has received less attention for reconstructing dynamic gene networks due to its incompatibility with the quantitative computational models. In this dissertation, I introduce a novel method to fuse qualitative gene interaction information with quantitative microarray data under the Dynamic Bayesian Networks framework. This method extends the previous data integration methods by its capabilities of both utilizing qualitative biological knowledge by using Bayesian Networks without the involvement of human experts, and taking time-series data to produce dynamic gene networks. The experimental study shows that when compared with standard Dynamic Bayesian Networks method which only uses microarray data, our method excels by both accuracy and consistency.

[1]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[2]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Richard M. Karp,et al.  Universal DNA Tag Systems: A Combinatorial Design Scheme , 2000, J. Comput. Biol..

[5]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[6]  Gregory F. Cooper,et al.  Causal Discovery from a Mixture of Experimental and Observational Data , 1999, UAI.

[7]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[8]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[9]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[10]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[11]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[12]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[13]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[14]  Nello Cristianini,et al.  Discovering Transcriptional Modules from Motif, Chip-Chip and Microarray Data , 2004, Pacific Symposium on Biocomputing.

[15]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[16]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[17]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[18]  R Sásik,et al.  Percolation clustering: a novel approach to the clustering of gene expression patterns in Dictyostelium development. , 2001, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[19]  Satoru Miyano,et al.  Using Protein-Protein Interactions for Refining Gene Networks Estimated from Microarray Data by Bayesian Networks , 2003, Pacific Symposium on Biocomputing.

[20]  Hui-Hsien Chou,et al.  UBViz: a software tool for exploring metabolic pathways in 3-D space. , 2005, BioTechniques.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[23]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[24]  Thomas P. Minka,et al.  From Hidden Markov Models to Linear Dynamical Systems , 1999 .

[25]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[26]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[27]  Song Li,et al.  Integrate Qualitative Biological Knowledge to Build Gene Networks by Parallel Dynamic Bayesian Network Structure Learning , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[28]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[29]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[30]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Hsinchun Chen,et al.  A framework of integrating gene relations from heterogeneous data sources: an experiment on Arabidopsis thaliana , 2006, Bioinform..

[32]  Srinivas Aluru Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series) , 2005 .

[33]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[34]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[35]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[36]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[37]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[38]  V. Thorsson,et al.  Discovery of regulatory interactions through perturbation: inference and experimental design. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[39]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[40]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[41]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[42]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[43]  Gregory Stephanopoulos,et al.  Elucidation of gene interaction networks through time-lagged correlation analysis of transcriptional data. , 2004, Genome research.

[44]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[45]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[46]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[47]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[48]  Timothy S Gardner,et al.  Reverse-engineering transcription control networks. , 2005, Physics of life reviews.

[49]  P. D’haeseleer,et al.  Mining the gene expression matrix: inferring gene relationships from large scale gene expression data , 1998 .

[50]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[52]  Tsuyoshi Kato,et al.  Selective integration of multiple biological data for supervised network inference , 2005, Bioinform..

[53]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[54]  D. G. Watts,et al.  Spectral analysis and its applications , 1968 .

[55]  Marek J. Druzdzel,et al.  Elicitation of Probabilities for Belief Networks: Combining Qualitative and Quantitative Information , 1995, UAI.

[56]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[57]  Olga G. Troyanskaya,et al.  Putting microarrays in a context: Integrated analysis of diverse biological data , 2005, Briefings Bioinform..

[58]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[59]  J. Claverie,et al.  Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. , 1999, Genome research.

[60]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[61]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[62]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[63]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[64]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[65]  Igor Rojdestvenski,et al.  Metabolic pathways in three dimensions , 2003, Bioinform..

[66]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[67]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[68]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[69]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[70]  Julie A. Dickerson,et al.  Multi-scale genetic network inference based on time series gene expression profiles , 2005 .

[71]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[72]  Anthony Jameson,et al.  Exploiting Qualitative Knowledge in the Learning of Conditional Probabilities of Bayesian Networks , 2000, UAI.

[73]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[74]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[75]  Hagit Shatkay,et al.  Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis , 2000, ISMB.

[76]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[77]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[78]  Satoru Miyano,et al.  Combining Microarrays and Biological Knowledge for Estimating Gene Networks via Bayesian Networks , 2004, J. Bioinform. Comput. Biol..

[79]  Emden R. Gansner,et al.  Graphviz and Dynagraph – Static and Dynamic Graph Drawing Tools , 2003 .

[80]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[81]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[82]  R. Altman,et al.  Whole-genome expression analysis: challenges beyond clustering. , 2001, Current opinion in structural biology.

[83]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[84]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[85]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[86]  José María Carazo,et al.  BMC Bioinformatics BioMed Central Methodology article Integrated analysis of gene expression by association rules discovery , 2022 .