Identifying frequent patterns in biochemical reaction networks: a workflow

Abstract Computational models in biology encode molecular and cell biological processes. Many of these models can be represented as biochemical reaction networks. Studying such networks, one is mostly interested in systems that share similar reactions and mechanisms. Typical goals of an investigation thus include understanding of model parts, identification of reoccurring patterns and recognition of biologically relevant motifs. The large number and size of available models, however, require automated methods to support researchers in achieving their goals. Specifically for the problem of finding patterns in large networks only partial solutions exist. We propose a workflow that identifies frequent structural patterns in biochemical reaction networks encoded in the Systems Biology Markup Language. The workflow utilizes a subgraph mining algorithm to detect the network patterns. Once patterns are identified, the textual pattern description can automatically be converted into a graphical representation. Furthermore, information about the distribution of patterns among a selected set of models can be retrieved. The workflow was validated with 575 models from the curated branch of BioModels. In this paper, we highlight interesting and frequent structural patterns. Furthermore, we provide exemplary patterns that incorporate terms from the Systems Biology Ontology. Our workflow can be applied to a custom set of models or to models already existing in our graph database MaSyMoS. The occurrences of frequent patterns may give insight into the encoding of central biological processes, evaluate postulated biological motifs or serve as a similarity measure for models that share common structures. Database URL: https://github.com/FabienneL/BioNet-Mining

[1]  Melanie I. Stefan,et al.  BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models , 2010, BMC Systems Biology.

[2]  Aviaja Anna Hansen,et al.  MiDAS: the field guide to the microbes of activated sludge , 2015, Database J. Biol. Databases Curation.

[3]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[4]  Nicolas Le Novère,et al.  Structure, function, and behaviour of computational models in systems biology , 2013, BMC Systems Biology.

[5]  Zhaohui Wu,et al.  An Efficient Recommendation Method for Improving Business Process Modeling , 2014, IEEE Transactions on Industrial Informatics.

[6]  E. Klipp,et al.  Retrieval, alignment, and clustering of computational models based on semantic annotations , 2011, Molecular systems biology.

[7]  Hugh D. Spence,et al.  Minimum information requested in the annotation of biochemical models (MIRIAM) , 2005, Nature Biotechnology.

[8]  James Hetherington,et al.  Computational challenges of systems biology , 2004, Computer.

[9]  J. Tyson Modeling the cell division cycle: cdc2 and cyclin interactions. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Astrid Junker,et al.  Wiring diagrams in biology: towards the standardized representation of biological information. , 2012, Trends in biotechnology.

[11]  Nicolas Le Novère,et al.  Ranked retrieval of Computational Biology models , 2010, BMC Bioinformatics.

[12]  Stan Matwin,et al.  Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases , 2007 .

[13]  Nicolas Le Novère,et al.  Systems Biology Graphical Notation: Process Description language Level 1 Version 1.3 , 2011, J. Integr. Bioinform..

[14]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[15]  Chun-Hsi Huang,et al.  Biological network motif detection: principles and practice , 2012, Briefings Bioinform..

[16]  Mohammad Reza Keyvanpour,et al.  Classification and Analysis of Frequent Subgraphs Mining Algorithms , 2012, J. Softw..

[17]  Olaf Wolkenhauer,et al.  Annotation-based feature extraction from sets of SBML models , 2015, Journal of biomedical semantics.

[18]  John J Tyson,et al.  Functional motifs in biochemical reaction networks. , 2010, Annual review of physical chemistry.

[19]  Jacky L. Snoep,et al.  Web-based kinetic modelling using JWS Online , 2004, Bioinform..

[20]  Yangyang Zhao,et al.  BioModels: ten-year anniversary , 2014, Nucleic Acids Res..

[21]  Olaf Wolkenhauer,et al.  An algorithm to detect and communicate the differences in computational models describing biological systems , 2015, Bioinform..

[22]  D. Broomhead,et al.  A model of yeast glycolysis based on a consistent kinetic characterisation of all its enzymes , 2013, FEBS letters.

[23]  P. Mendes,et al.  Systematic Construction of Kinetic Models from Genome-Scale Metabolic Networks , 2013, PloS one.

[24]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[25]  Peter J. Hunter,et al.  Bioinformatics Applications Note Databases and Ontologies the Physiome Model Repository 2 , 2022 .

[26]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[27]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[28]  Olaf Wolkenhauer,et al.  Combining computational models, semantic annotations and simulation experiments in a graph database , 2015, Database J. Biol. Databases Curation.

[29]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[30]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[31]  Gary R. Mirams,et al.  The Cardiac Electrophysiology Web Lab , 2016, Biophysical journal.

[32]  Sarala M. Wimalaratne,et al.  The Systems Biology Graphical Notation , 2009, Nature Biotechnology.

[33]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[34]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[35]  Jianzhi Zhang,et al.  A Big World Inside Small-World Networks , 2009, PloS one.

[36]  Maurice Bruynooghe,et al.  A polynomial time computable metric between point sets , 2001, Acta Informatica.

[37]  Ron Henkel,et al.  Notions of similarity for systems biology models , 2016, Briefings Bioinform..

[38]  Chris J. Myers,et al.  The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 2 Core Release 2 , 2018, J. Integr. Bioinform..

[39]  Olaf Wolkenhauer,et al.  COMODI: an ontology to characterise differences in versions of computational models in biology , 2016, Journal of Biomedical Semantics.

[40]  M. Elowitz,et al.  A synthetic oscillatory network of transcriptional regulators , 2000, Nature.