Guidelines for Reproducibly Building and Simulating Systems Biology Models

Objective: Reproducibility is the cornerstone of the scientific method. However, currently, many systems biology models cannot easily be reproduced. This paper presents methods that address this problem. Methods: We analyzed the recent Mycoplasma genitalium whole-cell (WC) model to determine the requirements for reproducible modeling. Results: We determined that reproducible modeling requires both repeatable model building and repeatable simulation. Conclusion: New standards and simulation software tools are needed to enhance and verify the reproducibility of modeling. New standards are needed to explicitly document every data source and assumption, and new deterministic parallel simulation tools are needed to quickly simulate large, complex models. Significance: We anticipate that these new standards and software will enable researchers to reproducibly build and simulate more complex models, including WC models.

[1]  Jonathan R. Karr,et al.  WholeCellKB: model organism databases for comprehensive whole-cell models , 2012, Nucleic Acids Res..

[2]  Anton Nekrutenko,et al.  Using Galaxy to Perform Large‐Scale Interactive Data Analyses , 2007, Current protocols in bioinformatics.

[3]  Michael Hucka,et al.  SBML Level 3 package: Hierarchical Model Composition, Version 1 Release 3 , 2015, Journal of integrative bioinformatics.

[4]  Jonathan R. Karr,et al.  The principles of whole-cell modeling. , 2015, Current opinion in microbiology.

[5]  Javier Carrera,et al.  Why Build Whole-Cell Models? , 2015, Trends in cell biology.

[6]  Derek N. Macklin,et al.  The future of whole-cell modeling. , 2014, Current opinion in biotechnology.

[7]  Hong Qian,et al.  Nonlinear biochemical signal processing via noise propagation. , 2013, The Journal of chemical physics.

[8]  K. Mani Chandy,et al.  Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[9]  Chris J. Myers,et al.  Toward community standards and software for whole-cell modeling , 2016, IEEE Transactions on Biomedical Engineering.

[10]  Jonathan R. Karr,et al.  A Whole-Cell Computational Model Predicts Phenotype from Genotype , 2012, Cell.

[11]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[12]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.

[13]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[14]  Sb Ras,et al.  BioUML: VISUAL MODELING, AUTOMATED CODE GENERATION AND SIMULATION OF BIOLOGICAL SYSTEMS , 2006 .

[15]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[16]  Daniel J. Blankenberg,et al.  Using Galaxy to Perform Large‐Scale Interactive Data Analyses , 2012, Current protocols in bioinformatics.

[17]  Darren J. Wilkinson,et al.  The SBML discrete stochastic models test suite , 2008, Bioinform..

[18]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[19]  Herbert M. Sauro,et al.  Antimony: a modular model definition language , 2009, Bioinform..

[20]  Allan Kuchinsky,et al.  The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology , 2014, Nature Biotechnology.

[21]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[22]  Herbert M. Sauro,et al.  Bioinformatics Applications Note Comparing Simulation Results of Sbml Capable Simulators , 2022 .

[23]  R. Zamar,et al.  A multivariate Kolmogorov-Smirnov test of goodness of fit , 1997 .

[24]  Gary D. Bader,et al.  Promoting Coordinated Development of Community-Based Information Standards for Modeling in Biology: The COMBINE Initiative , 2015, Front. Bioeng. Biotechnol..

[25]  Andreas Dräger,et al.  Improving Collaboration by Standardization Efforts in Systems Biology , 2014, Front. Bioeng. Biotechnol..

[26]  Eugene D. Brooks,et al.  The butterfly barrier , 1986, International Journal of Parallel Programming.

[27]  Arthur P. Goldberg,et al.  Toward Scalable Whole-Cell Modeling of Human Cells , 2016, SIGSIM-PADS.

[28]  Daniel J. Duffy The Boost C++ Libraries: Part II , 2011 .

[29]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[30]  Chris J Myers,et al.  Dynamic modeling of cellular populations within iBioSim. , 2013, ACS synthetic biology.

[31]  Barton P. Miller,et al.  What are race conditions?: Some issues and formalizations , 1992, LOPL.

[32]  Steve Easterbrook,et al.  Open code for open science , 2014 .

[33]  Peter J. Hunter,et al.  The CellML Model Repository , 2008, Bioinform..

[34]  Boris Schling The Boost C++ Libraries , 2011 .

[35]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[36]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[37]  Nick Juty,et al.  Systems Biology Ontology: Update , 2010 .

[38]  Michael Hucka,et al.  A Profile of Today's SBML-Compatible Software , 2011, 2011 IEEE Seventh International Conference on e-Science Workshops.

[39]  Sarala M. Wimalaratne,et al.  The Systems Biology Graphical Notation , 2009, Nature Biotechnology.

[40]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[41]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[42]  Peter J. Hunter,et al.  An Overview of CellML 1.1, a Biological Model Description Language , 2003, Simul..

[43]  Yangyang Zhao,et al.  BioModels: ten-year anniversary , 2014, Nucleic Acids Res..

[44]  Jacky L. Snoep,et al.  Reproducible computational biology experiments with SED-ML - The Simulation Experiment Description Markup Language , 2011, BMC Systems Biology.