Annotation-based feature extraction from sets of SBML models

BackgroundModel repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models.ResultsIn this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate.ConclusionsAnnotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison.

[1]  Gunter Saake,et al.  Efficient similarity-based operations for data integration , 2004, Data Knowl. Eng..

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Melanie I. Stefan,et al.  BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models , 2010, BMC Systems Biology.

[5]  Andreas Heuer,et al.  Das Sombi-Framework zum Ermitteln geeigneter Suchfunktionen für biologische Modelldatenbasen , 2011, Datenbank-Spektrum.

[6]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[7]  Olaf Wolkenhauer,et al.  Possibilities for Integrating Model-related Data in Computational Biology , 2013 .

[8]  R M Chau,et al.  [Cell cycle and apoptosis]. , 1996, Sheng li ke xue jin zhan [Progress in physiology].

[9]  Chris J. Myers,et al.  Meeting report from the fourth meeting of the Computational Modeling in Biology Network (COMBINE) , 2011, Standards in Genomic Sciences.

[10]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[11]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[12]  Antje Chang,et al.  BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA , 2012, Nucleic Acids Res..

[13]  Michael L. Hines,et al.  NeuroML: A Language for Describing Data Driven Models of Neurons and Networks with a High Degree of Biological Detail , 2010, PLoS Comput. Biol..

[14]  Peter J. Hunter,et al.  An Overview of CellML 1.1, a Biological Model Description Language , 2003, Simul..

[15]  Olaf Wolkenhauer,et al.  Combining computational models, semantic annotations and simulation experiments in a graph database , 2015, Database J. Biol. Databases Curation.

[16]  Peter J. Hunter,et al.  Bioinformatics Applications Note Databases and Ontologies the Physiome Model Repository 2 , 2022 .

[17]  Chris Albanese,et al.  NF-κB and cell-cycle regulation: the cyclin connection , 2001 .

[18]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[19]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[20]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[21]  Jacky L. Snoep,et al.  Web-based kinetic modelling using JWS Online , 2004, Bioinform..

[22]  Ulf Leser,et al.  InterOnto - Ranking Inter-Ontology Links , 2012, DILS.

[23]  Peter N. Robinson,et al.  Introduction to Bio-Ontologies , 2011 .

[24]  Hussein A. Abbass,et al.  A Comparative Study for Domain Ontology Guided Feature Extraction , 2003, ACSC.

[25]  T. Hofmann,et al.  The pro- or anti-apoptotic function of NF-kappaB is determined by the nature of the apoptotic stimulus. , 2000, European journal of biochemistry.

[26]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[27]  Marcos Da Silveira,et al.  Data Integration in the Life Sciences , 2017, Lecture Notes in Computer Science.

[28]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[29]  BechhoferSean,et al.  The OWL API: A Java API for OWL ontologies , 2011 .

[30]  Nicolas Le Novère,et al.  Ranked retrieval of Computational Biology models , 2010, BMC Bioinformatics.

[31]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[32]  Ron Henkel,et al.  Combining computational models, semantic annotations, and associated simulation experiments in a graph database , 2014 .

[33]  Mark Klein,et al.  How Similar Is It? Towards Personalized Similarity Measures in Ontologies , 2005, Wirtschaftsinformatik.

[34]  Olaf Wolkenhauer,et al.  Considerations of graph-based concepts to manage of computational biology models and associated simulations , 2012, GI-Jahrestagung.

[35]  Gary D. Bader,et al.  Promoting Coordinated Development of Community-Based Information Standards for Modeling in Biology: The COMBINE Initiative , 2015, Front. Bioeng. Biotechnol..

[36]  Yang Yu,et al.  Cumulated Ca2+ spike duration underlies Ca2+ oscillation frequency-regulated NFκB transcriptional activity , 2011, Journal of Cell Science.

[37]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[38]  Richi Nayak,et al.  Element similarity measures in XML schema matching , 2010, Inf. Sci..