An algorithm to detect and communicate the differences in computational models describing biological systems

Motivation: Repositories support the reuse of models and ensure transparency about results in publications linked to those models. With thousands of models available in repositories, such as the BioModels database or the Physiome Model Repository, a framework to track the differences between models and their versions is essential to compare and combine models. Difference detection not only allows users to study the history of models but also helps in the detection of errors and inconsistencies. Existing repositories lack algorithms to track a model’s development over time. Results: Focusing on SBML and CellML, we present an algorithm to accurately detect and describe differences between coexisting versions of a model with respect to (i) the models’ encoding, (ii) the structure of biological networks and (iii) mathematical expressions. This algorithm is implemented in a comprehensive and open source library called BiVeS. BiVeS helps to identify and characterize changes in computational models and thereby contributes to the documentation of a model’s history. Our work facilitates the reuse and extension of existing models and supports collaborative modelling. Finally, it contributes to better reproducibility of modelling results and to the challenge of model provenance. Availability and implementation: The workflow described in this article is implemented in BiVeS. BiVeS is freely available as source code and binary from sems.uni-rostock.de. The web interface BudHat demonstrates the capabilities of BiVeS at budhat.sems.uni-rostock.de. Contact: martin.scharm@uni-rostock.de Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Carole Goble,et al.  The SEEK: a platform for sharing data and models in systems biology. , 2011, Methods in enzymology.

[2]  Peter J. Hunter,et al.  Revision history aware repositories of computational models of biological systems , 2011, BMC Bioinformatics.

[3]  John H. Gennari,et al.  Integration of Multi-Scale Biosimulation Models via Light-Weight Semantics , 2008, Pacific Symposium on Biocomputing.

[4]  Gary R. Mirams,et al.  High-throughput functional curation of cellular electrophysiology models. , 2011, Progress in biophysics and molecular biology.

[5]  Peter J. Hunter,et al.  An Overview of CellML 1.1, a Biological Model Description Language , 2003, Simul..

[6]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Peter J. Hunter,et al.  Bioinformatics Applications Note Databases and Ontologies the Physiome Model Repository 2 , 2022 .

[8]  Nicolas Le Novère,et al.  BioModels Database: a repository of mathematical models of biological processes. , 2013, Methods in molecular biology.

[9]  Olaf Wolkenhauer,et al.  Improving the reuse of computational models through version control , 2013, Bioinform..

[10]  Sudha Ram,et al.  Provenance Management in BioSciences , 2010, ER Workshops.

[11]  L. Duysens Preprints , 1966, Nature.

[12]  Carole A. Goble,et al.  SEEK: a systems biology data and model management platform , 2015, BMC Systems Biology.

[13]  Amélie Marian,et al.  Change-Centric Management of Versions in an XML Warehouse , 2001, VLDB.

[14]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[15]  Jacky L. Snoep,et al.  Reproducible computational biology experiments with SED-ML - The Simulation Experiment Description Markup Language , 2011, BMC Systems Biology.

[16]  S. Goodman,et al.  Reproducible Research: Moving toward Research the Public Can Really Trust , 2007, Annals of Internal Medicine.

[17]  Richard Orton,et al.  Version control of pathway models using XML patches , 2009, BMC Systems Biology.

[18]  Melanie I. Stefan,et al.  BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models , 2010, BMC Systems Biology.

[19]  Neil Swainston,et al.  Towards a genome-scale kinetic model of cellular metabolism , 2010, BMC Systems Biology.

[20]  Michael L. Hines,et al.  NeuroML: A Language for Describing Data Driven Models of Neurons and Networks with a High Degree of Biological Detail , 2010, PLoS Comput. Biol..

[21]  J Chard,et al.  Pharmacometrics Markup Language (PharmML): Opening New Perspectives for Model Exchange in Drug Development , 2015, CPT: pharmacometrics & systems pharmacology.

[22]  J. Tyson,et al.  Numerical analysis of a comprehensive model of M-phase control in Xenopus oocyte extracts and intact embryos. , 1993, Journal of cell science.

[23]  Gary D. Bader,et al.  Cytoscape Web: an interactive web-based network browser , 2010, Bioinform..

[24]  Edda Klipp,et al.  Annotation and merging of SBML models with semanticSBML , 2010, Bioinform..

[25]  Robert Gentleman,et al.  Reproducible Research: A Bioinformatics Case Study , 2005, Statistical applications in genetics and molecular biology.

[26]  Jill P Mesirov,et al.  Accessible Reproducible Research , 2010, Science.

[27]  Maria Liakata,et al.  On the formalization and reuse of scientific research , 2011, Journal of The Royal Society Interface.

[28]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[29]  A. Casadevall,et al.  Reproducible Science , 2010, Infection and Immunity.

[30]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[31]  Olaf Wolkenhauer,et al.  Reproducibility of Model-Based Results in Systems Biology , 2013 .

[32]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[33]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[34]  Peter J. Hunter,et al.  The CellML Model Repository , 2008, Bioinform..