Multimodal networks in biology

A multimodal network (MMN) is a novel mathematical construct that captures the structure of biological networks, computational network models, and relationships from biological databases. An MMN subsumes the structure of graphs and hypergraphs, either undirected or directed. Formally, an MMN is a triple (V, E, M) where V is a set of vertices, E is a set of modal hyperedges, and M is a set of modes. A modal hyperedge e = (T, H, A, m) ∈ E is an ordered 4-tuple, in which T, H, A ⊆ V and m ∈ M. The sets T, H, and A are the tail, head, and associate of e, while m is its mode. In the context of biology, each vertex is a biological entity, each hyperedge is a relationship, and each mode is a type of relationship (e.g., 'forms complex' and 'is a'). Within the space of multimodal networks M , structural operations such as union, intersection, hyperedge contraction, subnetwork selection, and graph or hypergraph projections can be performed. A denotational semantics approach is used to specify the semantics of each hyperedge in MMN in terms of interaction among its vertices. This is done by mapping each hyperedge e to a hyperedge code algo V (e), an algorithm that details how the vertices in V (e) get used and updated. A semantic MMN-based model is a function of a given schedule of evaluation of hyperedge codes and the current state of the model, a set of vertex-value pairs. An MMN-based computational system is implemented as a proof of concept to determine empirically the benefits of having it. This system consists of an MMN database populated by data from various biological databases, MMN operators implemented as database functions, graph operations implemented in C++ using LEDA, and mmnsh, a shell scripting language that provides a consistent interface to both data and operators. It is demonstrated that computational network models may enrich the MMN database and MMN data may be used as input to other computational tools and environments. A simulator is developed to compute from an initial state and a schedule of hyperedge codes the resulting state of a semantic MMN model.

[1]  Michael Y. Galperin The Molecular Biology Database Collection: 2005 update , 2004, Nucleic Acids Res..

[2]  Ellis Horowitz,et al.  Fundamentals of Data Structures , 1984 .

[3]  K. Scharf,et al.  Arabidopsis and the heat stress transcription factor world: how many heat stress transcription factors do we need? , 2001, Cell stress & chaperones.

[4]  Steven C. Lawlor,et al.  GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways , 2002, Nature Genetics.

[5]  Satoru Miyano,et al.  Combining Microarrays and Biological Knowledge for Estimating Gene Networks via Bayesian Networks , 2004, J. Bioinform. Comput. Biol..

[6]  P. Geigenberger,et al.  Regulation of sucrose to starch conversion in growing potato tubers. , 2003, Journal of experimental botany.

[7]  H. McAdams,et al.  Circuit simulation of genetic networks. , 1995, Science.

[8]  V. Anne Smith,et al.  Evaluating functional network inference using simulations of complex biological systems , 2002, ISMB.

[9]  Julien Gagneur,et al.  Hierarchical Analysis of Dependency in Metabolic Networks , 2003, Bioinform..

[10]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[11]  Byung-Hoon Kim,et al.  Interaction between Arabidopsis heat shock transcription factor 1 and 70 kDa heat shock proteins. , 2002, Journal of experimental botany.

[12]  Michael Stonebraker,et al.  The POSTGRES next generation database management system , 1991, CACM.

[13]  K. Kohn Molecular interaction maps as information organizers and simulation guides. , 2001, Chaos.

[14]  Peter D. Karp,et al.  The Pathway Tools software , 2002, ISMB.

[15]  M. Kanehisa,et al.  A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. , 2000, Nucleic acids research.

[16]  Gregory F. Cooper,et al.  Discovery of Causal Relationships in a Gene-Regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data , 2001, Pacific Symposium on Biocomputing.

[17]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[18]  Ron Maimon,et al.  Diagrammatic Notation and Computational Structure of Gene Networks , 2001 .

[19]  D. A. Baxter,et al.  Modeling transcriptional control in gene networks—methods, recent results, and future directions , 2000, Bulletin of mathematical biology.

[20]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[21]  Thomas Lengauer,et al.  Analysis of Gene Expression Data with Pathway Scores , 2000, ISMB.

[22]  Juan P. Steibel,et al.  Reassessing Design and Analysis of two-Colour Microarray Experiments Using Mixed Effects Models , 2005, Comparative and functional genomics.

[23]  Thomas Pfeiffer,et al.  Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae , 2002, Bioinform..

[24]  Vladimir Cherkassky,et al.  Learning from data , 1998 .

[25]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[26]  J. Dumont,et al.  The visual display of regulatory information and networks. , 2000, Trends in cell biology.

[27]  Ellis Horowitz,et al.  Fundamentals of data structures in C , 1976 .

[28]  M. Kanehisa,et al.  Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping. , 2000, Nucleic acids research.

[29]  V. Anne Smith,et al.  Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks , 2002 .

[30]  M. Kanehisa,et al.  Extraction of correlated gene clusters by multiple graph comparison. , 2001, Genome informatics. International Conference on Genome Informatics.

[31]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[32]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[33]  J. Vohradský Neural network model of gene expression , 2001, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[34]  William Y. C. Chen,et al.  Discrete dynamical systems on graphs and Boolean functions , 2004, Math. Comput. Simul..

[35]  James M. Bower,et al.  Computational modeling of genetic and biochemical networks , 2001 .

[36]  G. Gloor,et al.  The Hsp90 family of proteins in Arabidopsis thaliana , 2001, Cell stress & chaperones.

[37]  Nathan Salomonis,et al.  Identifying genetic networks underlying myometrial transition to labor , 2005, Genome Biology.

[38]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[39]  Bruce Momjian,et al.  PostgreSQL: Introduction and Concepts , 2000 .

[40]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[41]  Hiroaki Kitano,et al.  CellDesigner: a process diagram editor for gene-regulatory and biochemical networks , 2003 .

[42]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[43]  Yasubumi Sakakibara,et al.  Pair hidden Markov models on tree structures , 2003, ISMB.

[44]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[45]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[46]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[47]  Dick D. Mosser,et al.  Heat-shock protein 70 inhibits apoptosis by preventing recruitment of procaspase-9 to the Apaf-1 apoptosome , 2000, Nature Cell Biology.

[48]  Yukiko Matsuoka,et al.  Using process diagrams for the graphical representation of biological networks , 2005, Nature Biotechnology.

[49]  Martin Vingron,et al.  Large scale hierarchical clustering of protein sequences , 2005, BMC Bioinformatics.

[50]  Simon Cawley,et al.  Applications of generalized pair hidden Markov models to alignment and gene finding problems , 2001, J. Comput. Biol..

[51]  Kathy Chen,et al.  Network dynamics and cell physiology , 2001, Nature Reviews Molecular Cell Biology.

[52]  G. Odell,et al.  The segment polarity network is a robust developmental module , 2000, Nature.

[53]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[54]  Emad S. Alnemri,et al.  Negative regulation of the Apaf-1 apoptosome by Hsp70 , 2000, Nature Cell Biology.

[55]  Christian M. Reidys,et al.  Elements of a theory of simulation III: equivalence of SDS , 2001, Appl. Math. Comput..

[56]  Christian M. Reidys,et al.  Elements of a theory of computer simulation I: Sequential CA over random graphs , 1999, Appl. Math. Comput..

[57]  K. Kohn Molecular interaction map of the mammalian cell cycle control and DNA repair systems. , 1999, Molecular biology of the cell.

[58]  Petter Holme,et al.  Subnetwork hierarchies of biochemical pathways , 2002, Bioinform..

[59]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[60]  Atul Butte,et al.  The use and analysis of microarray data , 2002, Nature Reviews Drug Discovery.

[61]  Mark Gerstein,et al.  Protein fold and family occurrence in genomes : power-law behaviour and evolutionary model Running title : Power-law behaviour and evolutionary model , 2001 .

[62]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[63]  Guido Kroemer,et al.  Hsp27 negatively regulates cell death by interacting with cytochrome c , 2000, Nature Cell Biology.

[64]  Antje Chang,et al.  BRENDA, enzyme data and metabolic information , 2002, Nucleic Acids Res..

[65]  Robert D. Tennent,et al.  The denotational semantics of programming languages , 1976, CACM.

[66]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[67]  M. Delseny,et al.  Genomic analysis of the Hsp70 superfamily in Arabidopsis thaliana , 2001, Cell stress & chaperones.

[68]  Toshihisa Takagi,et al.  Knowledge representation of signal transduction pathways , 2001, Bioinform..

[69]  Edward R. Dougherty,et al.  From Boolean to probabilistic Boolean networks as models of genetic regulatory networks , 2002, Proc. IEEE.

[70]  Russ B. Altman,et al.  Modelling biological processes using workflow and Petri Net models , 2002, Bioinform..

[71]  M. Kathleen Kerr,et al.  Linear Models for Microarray Data Analysis: Hidden Similarities and Differences , 2003, J. Comput. Biol..

[72]  D. West Introduction to Graph Theory , 1995 .

[73]  Paul N. Weinberg,et al.  SQL, the complete reference , 1999 .

[74]  Hiroaki Kitano,et al.  A graphical notation for biochemical networks , 2003 .

[75]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[76]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[77]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[78]  Harry B. Hunt,et al.  Reachability problems for sequential dynamical systems with threshold functions , 2003, Theor. Comput. Sci..

[79]  Lenwood S. Heath Networks in bioinformatics , 2002, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02.

[80]  Jonathan I. Watkinson,et al.  Photosynthetic Acclimation Is Reflected in Specific Patterns of Gene Expression in Drought-Stressed Loblolly Pine1[w] , 2003, Plant Physiology.

[81]  Lloyd Allison,et al.  A Practical Introduction to Denotational Semantics , 1987 .

[82]  Robin Milner,et al.  The Polyadic π-Calculus: a Tutorial , 1993 .

[83]  E. Ron,et al.  Regulation of Heat‐Shock Response in Bacteria , 1998, Annals of the New York Academy of Sciences.

[84]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2001, Genetical research.

[85]  P J Goss,et al.  Quantitative modeling of stochastic systems in molecular biology by using stochastic Petri nets. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[86]  Susumu Goto,et al.  LIGAND: chemical database for enzyme reactions , 1998, Bioinform..

[87]  L. Pachter,et al.  SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. , 2003, Genome research.

[88]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[89]  S. Rhee,et al.  MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. , 2004, The Plant journal : for cell and molecular biology.

[90]  Allan A. Sioson,et al.  Expresso and chips: creating a next generation microarray experiment management system , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[91]  Giorgio Gallo,et al.  Directed Hypergraphs and Applications , 1993, Discret. Appl. Math..

[92]  Jungwon Yoon,et al.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community , 2003, Nucleic Acids Res..

[93]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[94]  Claus Dethlefsen,et al.  deal: A Package for Learning Bayesian Networks , 2003 .

[95]  An-Ping Zeng,et al.  Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms , 2003, Bioinform..

[96]  D. Nicholson,et al.  Heat-shock proteins as death determinants , 2000, Nature Cell Biology.

[97]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[98]  H. Kitano,et al.  A comprehensive pathway map of epidermal growth factor receptor signaling , 2005, Molecular systems biology.

[99]  Pierre R. Bushel,et al.  STATISTICAL ANALYSIS OF A GENE EXPRESSION MICROARRAY EXPERIMENT WITH REPLICATION , 2002 .

[100]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[101]  I. Holmes,et al.  Using guide trees to construct multiple-sequence evolutionary HMMs , 2003, ISMB.

[102]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[103]  Corrado Priami,et al.  Stochastic pi-Calculus , 1995, Comput. J..

[104]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[105]  Christian M. Reidys,et al.  Discrete, sequential dynamical systems , 2001, Discret. Math..

[106]  Peter Dalgaard,et al.  Introductory statistics with R , 2002, Statistics and computing.

[107]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[108]  John N Weinstein,et al.  Molecular Interaction Maps--A Diagrammatic Graphical Language for Bioregulatory Networks , 2004, Science's STKE.

[109]  Susumu Goto,et al.  LIGAND: database of chemical compounds and reactions in biological pathways , 2002, Nucleic Acids Res..

[110]  E. Davidson,et al.  Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. , 1998, Science.

[111]  Thomas Lengauer,et al.  Pathway analysis in metabolic databases via differetial metabolic display (DMD) , 2000, German Conference on Bioinformatics.

[112]  Mark Stitt,et al.  Metabolic control analysis and regulation of the conversion of sucrose to starch in growing potato tubers , 2004 .

[113]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[114]  Christian M. Reidys,et al.  Elements of a theory of simulation II: sequential dynamical systems , 2000, Appl. Math. Comput..

[115]  Vasant Honavar,et al.  Temporal Boolean Network Models of Genetic Networks and their Inference from Gene Expression Time Series , 2001, Complex Syst..

[116]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..

[117]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[118]  Lenwood S. Heath,et al.  Studying the Functional Genomics of Stress Responses in Loblolly Pine With the Expresso Microarray Experiment Management System , 2002, Comparative and functional genomics.

[119]  Jan Wielemaker,et al.  An Overview of the SWI-Prolog Programming Environment , 2003, WLPE.

[120]  Corrado Priami,et al.  Application of a stochastic name-passing calculus to representation and simulation of molecular processes , 2001, Inf. Process. Lett..

[121]  Joachim Selbig,et al.  Extension of the Visualization Tool MapMan to Allow Statistical Analysis of Arrays, Display of Coresponding Genes, and Comparison with Known Responses1 , 2005, Plant Physiology.

[122]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..