Finding patterns in biochemical reaction networks

Computational models in biology encode molecular and cell biological processes. Many of them can be represented as biochemical reaction networks. Studying such networks, one is often interested in systems that share similar reactions and mechanisms. Typical goals are to understand the parts of a model, to identify reoccurring patterns, and to find biologically relevant motifs. The large number of models are available for such a search, but also the large size of models require automated methods.Specifically the generic problem of finding patterns in large networks is computationally hard. As a consequence, only partial solutions for a structural analysis of models exist. Here we introduce a tool chain that identifies reoccurring patterns in biochemical reaction networks. We started this work with an evaluation of algorithms for the identification of frequent subgraphs. Then, we created graph representations of existing SBML models and ran the most suitable algorithm on the data. The result was a list of reaction patterns together with statistics about the occurrence of each pattern in the data set. The approach was validated with 575 SBML models from the curated branch of BioModels. We analysed how the resulting patterns confirm with expectations from the literature and from previous model statistics. In the future, the identified patterns can serve as a tool to measure the similarity of models.

[1]  Melanie I. Stefan,et al.  BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models , 2010, BMC Systems Biology.

[2]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[3]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  M. Elowitz,et al.  A synthetic oscillatory network of transcriptional regulators , 2000, Nature.

[5]  E. Klipp,et al.  Retrieval, alignment, and clustering of computational models based on semantic annotations , 2011, Molecular systems biology.

[6]  Hugh D. Spence,et al.  Minimum information requested in the annotation of biochemical models (MIRIAM) , 2005, Nature Biotechnology.

[7]  James Hetherington,et al.  Computational challenges of systems biology , 2004, Computer.

[8]  Mohammad Reza Keyvanpour,et al.  Classification and Analysis of Frequent Subgraphs Mining Algorithms , 2012, J. Softw..

[9]  Amnon Shashua,et al.  Probabilistic graph and hypergraph matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  J. Tyson Modeling the cell division cycle: cdc2 and cyclin interactions. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Astrid Junker,et al.  Wiring diagrams in biology: towards the standardized representation of biological information. , 2012, Trends in biotechnology.

[12]  Nicolas Le Novère,et al.  Ranked retrieval of Computational Biology models , 2010, BMC Bioinformatics.

[13]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[14]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[15]  Michael Hucka,et al.  The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 1 Core , 2010, J. Integr. Bioinform..

[16]  Gary R. Mirams,et al.  The Cardiac Electrophysiology Web Lab , 2016, Biophysical journal.

[17]  Martin Eisenacher,et al.  Invited presentations, junior research groups and research highlights at GCB 2015 , 2015, PeerJ Prepr..

[18]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[19]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.

[20]  Michael Hucka,et al.  A Profile of Today's SBML-Compatible Software , 2011, 2011 IEEE Seventh International Conference on e-Science Workshops.

[21]  Jacky L. Snoep,et al.  Web-based kinetic modelling using JWS Online , 2004, Bioinform..

[22]  Yangyang Zhao,et al.  BioModels: ten-year anniversary , 2014, Nucleic Acids Res..

[23]  Jacky L. Snoep,et al.  Reproducible computational biology experiments with SED-ML - The Simulation Experiment Description Markup Language , 2011, BMC Systems Biology.

[24]  Olaf Wolkenhauer,et al.  An algorithm to detect and communicate the differences in computational models describing biological systems , 2015, Bioinform..

[25]  D. Broomhead,et al.  A model of yeast glycolysis based on a consistent kinetic characterisation of all its enzymes , 2013, FEBS letters.

[26]  Sarala M. Wimalaratne,et al.  The Systems Biology Graphical Notation , 2009, Nature Biotechnology.

[27]  Michael Hucka,et al.  The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 1 Core , 2010 .

[28]  Jianzhi Zhang,et al.  A Big World Inside Small-World Networks , 2009, PloS one.

[29]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[30]  Peter J. Hunter,et al.  Bioinformatics Applications Note Databases and Ontologies the Physiome Model Repository 2 , 2022 .

[31]  Olaf Wolkenhauer,et al.  Improving the reuse of computational models through version control , 2013, Bioinform..

[32]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[33]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[34]  Debahuti Mishra,et al.  An approach to graph mining using gSpan algorithm , 2010, 2010 International Conference on Computer and Communication Technology (ICCCT).

[35]  Nicolas Le Novère,et al.  Structure, function, and behaviour of computational models in systems biology , 2013, BMC Systems Biology.

[36]  Wolfgang Nejdl,et al.  Semantically Enhanced Searching and Ranking on the Desktop , 2005, Semantic Desktop Workshop.

[37]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[38]  Nick Juty,et al.  Systems Biology Ontology: Update , 2010 .

[39]  Catherine M Lloyd,et al.  CellML: its future, present and past. , 2004, Progress in biophysics and molecular biology.

[40]  Maurice Bruynooghe,et al.  A polynomial time computable metric between point sets , 2001, Acta Informatica.

[41]  Ron Henkel,et al.  Notions of similarity for systems biology models , 2016, Briefings Bioinform..

[42]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[43]  Olaf Wolkenhauer,et al.  COMODI: an ontology to characterise differences in versions of computational models in biology , 2016, Journal of Biomedical Semantics.

[44]  Nicolas Le Novère,et al.  COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project , 2014, BMC Bioinformatics.

[45]  Massimo Marchiori,et al.  Error and attacktolerance of complex network s , 2004 .

[46]  Ron Henkel,et al.  Notions of similarity for computational biology models , 2016, bioRxiv.

[47]  P. Mendes,et al.  Systematic Construction of Kinetic Models from Genome-Scale Metabolic Networks , 2013, PloS one.

[48]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[49]  K. Lakshmi,et al.  FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION , 2012, ICIT 2012.

[50]  Olaf Wolkenhauer,et al.  Combining computational models, semantic annotations and simulation experiments in a graph database , 2015, Database J. Biol. Databases Curation.

[51]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[52]  Olaf Wolkenhauer,et al.  Annotation-based feature extraction from sets of SBML models , 2014, Journal of Biomedical Semantics.

[53]  Sarah M. Keating,et al.  BioModels: Content, Features, Functionality, and Use , 2015, CPT: pharmacometrics & systems pharmacology.

[54]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[55]  Chun-Hsi Huang,et al.  Biological network motif detection: principles and practice , 2012, Briefings Bioinform..

[56]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[57]  John J Tyson,et al.  Functional motifs in biochemical reaction networks. , 2010, Annual review of physical chemistry.

[58]  Dagmar Waltemath,et al.  How Can Semantic Annotations Support the Identification of Network Similarities? , 2014, SWAT4LS.