Combining computational models, semantic annotations, and associated simulation experiments in a graph database

13 Model repositories such as the BioModels Database or the CellML Model Repository are 14 frequently accessed to retrieve computational models describing biological systems. However, 15 the current designs of these databases limit the types of supported queries, and many data 16 in these repositories cannot easily be accessed. Computational methods for model retrieval 17 cannot be applied. In this paper we present a storage concept that meets this challenge. It 18 grounds on a graph database, re ects the models' structure, incorporates semantic annotations 19 and experiment descriptions, and ultimately connects di erent types of model-related data. 20 The connections between heterogeneous model-related data and bio-ontologies enable e cient 21 search via biological facts and grant access to new model features such as network structure. 22 The introduced concept notably improves the access of computational models and associated 23 simulations in a model repository. This has positive e ects on tasks such as model search, 24 retrieval, ranking, matching, ltering etc. We exemplify how CellMLand SBML-encoded 25 models can be maintained in one database, how these models can be linked via annotations, 26 and queried. 27 2 PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.376v2 | CC-BY 4.0 Open Access | rec: 6 Oct 2014, publ: 6 Oct 2014 P re P rin ts

[1]  Andreas Zell,et al.  Path2Models: large-scale generation of computational models from biochemical pathway maps , 2013, BMC Systems Biology.

[2]  Dagmar Waltemath,et al.  A call for virtual experiments: accelerating the scientific process. , 2015, Progress in biophysics and molecular biology.

[3]  Gary R. Mirams,et al.  High-throughput functional curation of cellular electrophysiology models. , 2011, Progress in biophysics and molecular biology.

[4]  Michel Dumontier,et al.  Semantic Systems Biology: Formal Knowledge Representation in Systems Biology for Model Construction, Retrieval, Validation and Discovery , 2013 .

[5]  J. Tyson Modeling the cell division cycle: cdc2 and cyclin interactions. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[6]  John H. Gennari,et al.  Multiple ontologies in action: Composite annotations for biosimulation models , 2011, J. Biomed. Informatics.

[7]  Olaf Wolkenhauer,et al.  Improving the reuse of computational models through version control , 2013, Bioinform..

[8]  Michael Hucka,et al.  The Systems Biology Markup Language (SBML) Level 2 Version 2 , 2007 .

[9]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[10]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[11]  Olaf Wolkenhauer,et al.  Annotation-based feature extraction from sets of SBML models , 2015, Journal of biomedical semantics.

[12]  Jim Webber,et al.  Graph Databases: New Opportunities for Connected Data , 2013 .

[13]  Melanie I. Stefan,et al.  BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models , 2010, BMC Systems Biology.

[14]  Nicolas Le Novère,et al.  Ranked retrieval of Computational Biology models , 2010, BMC Bioinformatics.

[15]  Lena Strömbäck,et al.  A Method for Semi-automatic Standard Integration in Systems Biology , 2008, DEXA.

[16]  Matthew R. Pocock,et al.  Annotation of SBML models through rule-based semantic integration , 2009, J. Biomed. Semant..

[17]  Carole A. Goble,et al.  Structuring research methods and data with the research object model: genomics workflows as a case study , 2013, Journal of Biomedical Semantics.

[18]  Sarala M. Wimalaratne,et al.  The RICORDO approach to semantic interoperability for biomedical data and models: strategy, standards and solutions , 2011, BMC Research Notes.

[19]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[20]  Catherine M Lloyd,et al.  CellML: its future, present and past. , 2004, Progress in biophysics and molecular biology.

[21]  Norman W. Paton,et al.  SBRML: a markup language for associating systems biology data with models , 2010, Bioinform..

[22]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[23]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[24]  Georg Lausen,et al.  RDFPath: Path Query Processing on Large RDF Graphs with MapReduce , 2011, ESWC Workshops.

[25]  Edmund J. Crampin,et al.  Biophysical annotation and representation of CellML models , 2009, Bioinform..

[26]  Nicolas Le Novère,et al.  Simulation Experiment Description Markup Language (SED-ML) : Level 1 Version 1 , 2011 .

[27]  Nicolas Le Novère,et al.  Simulation Experiment Description Markup Language (SED-ML) , 2014, Encyclopedia of Computational Neuroscience.

[28]  Olaf Wolkenhauer,et al.  Annotation-based feature extraction from sets of SBML models , 2014, Journal of Biomedical Semantics.

[29]  Olaf Wolkenhauer,et al.  Considerations of graph-based concepts to manage of computational biology models and associated simulations , 2012, GI-Jahrestagung.

[30]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[31]  Nicolas Le Novère,et al.  Data Integration and Semantic Enrichment of Systems Biology Models and Simulations , 2009, DILS.

[32]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[33]  Arnon Rosenthal,et al.  XML's Impact on Databases and Data Sharing , 2001, Computer.

[34]  Jacky L. Snoep,et al.  Reproducible computational biology experiments with SED-ML - The Simulation Experiment Description Markup Language , 2011, BMC Systems Biology.

[35]  J. Tyson,et al.  Modeling the control of DNA replication in fission yeast. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Hugh D. Spence,et al.  Minimum information requested in the annotation of biochemical models (MIRIAM) , 2005, Nature Biotechnology.

[37]  Olaf Wolkenhauer,et al.  Possibilities for Integrating Model-related Data in Computational Biology , 2013 .

[38]  Nicolas Le Novère,et al.  Structure, function, and behaviour of computational models in systems biology , 2013, BMC Systems Biology.

[39]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[40]  Peter J. Hunter,et al.  Bioinformatics Applications Note Databases and Ontologies the Physiome Model Repository 2 , 2022 .

[41]  Peter J. Hunter,et al.  An Overview of CellML 1.1, a Biological Model Description Language , 2003, Simul..

[42]  Christopher J. Rawlings,et al.  Lost in translation: data integration tools meet the Semantic Web (experiences from the Ondex project) , 2011, ICDE 2012.

[43]  Michael L. Hines,et al.  NeuroML: A Language for Describing Data Driven Models of Neurons and Networks with a High Degree of Biological Detail , 2010, PLoS Comput. Biol..

[44]  Michael Hucka,et al.  A Profile of Today's SBML-Compatible Software , 2011, 2011 IEEE Seventh International Conference on e-Science Workshops.

[45]  Sarala M. Wimalaratne,et al.  The Systems Biology Graphical Notation , 2009, Nature Biotechnology.

[46]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[47]  Yixin Chen,et al.  A comparison of a graph database and a relational database: a data provenance perspective , 2010, ACM SE '10.

[48]  Natalya F. Noy,et al.  BioPortal: Ontologies and Integrated Data Resources at the Click of a Mouse , 2009 .

[49]  Edmund J. Crampin,et al.  Minimum Information About a Simulation Experiment (MIASE) , 2011, PLoS Comput. Biol..

[50]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..