How Modeling Standards, Software, and Initiatives Support Reproducibility in Systems Biology and Systems Medicine

Objective: Only reproducible results are of significance to science. The lack of suitable standards and appropriate support of standards in software tools has led to numerous publications with irreproducible results. Our objectives are to identify the key challenges of reproducible research and to highlight existing solutions. Results: In this paper, we summarize problems concerning reproducibility in systems biology and systems medicine. We focus on initiatives, standards, and software tools that aim to improve the reproducibility of simulation studies. Conclusions: The long-term success of systems biology and systems medicine depends on trustworthy models and simulations. This requires openness to ensure reusability and transparency to enable reproducibility of results in these fields.

[1]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[2]  Nicolas Le Novère,et al.  Ranked retrieval of Computational Biology models , 2010, BMC Bioinformatics.

[3]  Jeffrey T. Leek,et al.  Opinion: Reproducible research can still be wrong: Adopting a prevention approach , 2015, Proceedings of the National Academy of Sciences.

[4]  Gary D. Bader,et al.  Promoting Coordinated Development of Community-Based Information Standards for Modeling in Biology: The COMBINE Initiative , 2015, Front. Bioeng. Biotechnol..

[5]  Olaf Wolkenhauer,et al.  Reproducibility of Model-Based Results in Systems Biology , 2013 .

[6]  F. Collins,et al.  Policy: NIH plans to enhance reproducibility , 2014, Nature.

[7]  Sean Bechhofer,et al.  Research Objects: Towards Exchange and Reuse of Digital Knowledge , 2010 .

[8]  Nicolas P. Rougier,et al.  A long journey into reproducible computational neuroscience , 2015, Front. Comput. Neurosci..

[9]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[10]  scharm martin,et al.  COMBINE Archive Show Case , 2016 .

[11]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[12]  Asher Mullard,et al.  Reliability of 'new drug target' claims called into question , 2011, Nature Reviews Drug Discovery.

[13]  Christian R. A. Regenbrecht,et al.  Data management strategies for multinational large-scale systems biology projects , 2012, Briefings Bioinform..

[14]  Dagmar Waltemath,et al.  Extracting reproducible simulation studies from model repositories using the combinearchive toolkit , 2015, BTW Workshops.

[15]  Ian David Lockhart Bogle,et al.  Addressing the challenges of multiscale model management in systems biology , 2007, Comput. Chem. Eng..

[16]  Chris T. A. Evelo,et al.  The systems biology format converter , 2016, BMC Bioinformatics.

[17]  Peter J. Hunter,et al.  An Overview of CellML 1.1, a Biological Model Description Language , 2003, Simul..

[18]  Dagmar Waltemath,et al.  Simulation Experiment Description Markup Language (SED-ML) Level 1 Version 3 (L1V3) , 2015, J. Integr. Bioinform..

[19]  Mudita Singhal,et al.  COPASI - a COmplex PAthway SImulator , 2006, Bioinform..

[20]  Edda Klipp,et al.  Annotation and merging of SBML models with semanticSBML , 2010, Bioinform..

[21]  Nicholas T. Carnevale,et al.  ModelDB: A Database to Support Computational Neuroscience , 2004, Journal of Computational Neuroscience.

[22]  Matthias Stein,et al.  SYCAMORE - a systems biology computational analysis and modeling research environment , 2008, Bioinform..

[23]  Jacky L. Snoep,et al.  Web-based kinetic modelling using JWS Online , 2004, Bioinform..

[24]  Carole A. Goble,et al.  SEEK: a systems biology data and model management platform , 2015, BMC Systems Biology.

[25]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[26]  Suzanne K. Linder,et al.  A Survey on Data Reproducibility in Cancer Research Provides Insights into Our Limited Ability to Translate Findings from the Laboratory to the Clinic , 2013, PloS one.

[27]  Iveta Simera,et al.  EQUATOR: reporting guidelines for health research. , 2008, Lancet.

[28]  Blaustein Richard Reproducibility Undergoes Scrutiny , 2014 .

[29]  J. Arrowsmith Trial watch: Phase II failures: 2008–2010 , 2011, Nature Reviews Drug Discovery.

[30]  J. Brooks Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece , 2008 .

[31]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[32]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[33]  Michael Hucka,et al.  A Profile of Today's SBML-Compatible Software , 2011, 2011 IEEE Seventh International Conference on e-Science Workshops.

[34]  Jacky L. Snoep,et al.  Reproducible computational biology experiments with SED-ML - The Simulation Experiment Description Markup Language , 2011, BMC Systems Biology.

[35]  Martin Renqiang Min,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[36]  Barend Mons,et al.  Which gene did you mean? , 2005, BMC Bioinformatics.

[37]  R. Tibshirani,et al.  Increasing value and reducing waste in research design, conduct, and analysis , 2014, The Lancet.

[38]  Olaf Wolkenhauer,et al.  An algorithm to detect and communicate the differences in computational models describing biological systems , 2015, Bioinform..

[39]  Richard O. Sinnott,et al.  Large-scale data sharing in the life sciences: Data standards, incentives, barriers and funding models (The "Joint Data Standards Study") , 2005 .

[40]  Credit where credit is overdue , 2009, Nature Biotechnology.

[41]  Stephen R. Piccolo,et al.  Tools and techniques for computational reproducibility , 2016, GigaScience.

[42]  Sarala M. Wimalaratne,et al.  The Systems Biology Graphical Notation , 2009, Nature Biotechnology.

[43]  Chris J. Myers,et al.  Meeting report from the fourth meeting of the Computational Modeling in Biology Network (COMBINE) , 2011, Standards in Genomic Sciences.

[44]  Saul Perlmutter,et al.  Blind analysis: Hide results to seek the truth , 2015, Nature.

[45]  J Chard,et al.  Pharmacometrics Markup Language (PharmML): Opening New Perspectives for Model Exchange in Drug Development , 2015, CPT: pharmacometrics & systems pharmacology.

[46]  H. Piwowar,et al.  Data archiving is a good investment , 2011, Nature.

[47]  Jonah Lehrer The Truth Wears Off , 2010 .

[48]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[49]  John P. A. Ioannidis,et al.  How to Make More Published Research True , 2014, PLoS medicine.

[50]  Dagmar Waltemath,et al.  Simulation Experiment Description Markup Language (SED-ML) Level 1 Version 2. , 2015, Journal of integrative bioinformatics.

[51]  Michael L. Hines,et al.  NeuroML: A Language for Describing Data Driven Models of Neurons and Networks with a High Degree of Biological Detail , 2010, PLoS Comput. Biol..

[52]  L. F. Perrone,et al.  SBW – A MODULAR FRAMEWORK FOR SYSTEMS BIOLOGY , 2006 .

[53]  Nicolas Le Novère,et al.  Identifiers.org and MIRIAM Registry: community resources to provide persistent identification , 2011, Nucleic Acids Res..

[54]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[55]  Allan Kuchinsky,et al.  The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology , 2014, Nature Biotechnology.

[56]  Gary R. Mirams,et al.  The Cardiac Electrophysiology Web Lab , 2016, Biophysical journal.

[57]  Olaf Wolkenhauer,et al.  Combining computational models, semantic annotations and simulation experiments in a graph database , 2015, Database J. Biol. Databases Curation.

[58]  Nicolas Le Novère,et al.  COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project , 2014, BMC Bioinformatics.

[59]  Peter Li,et al.  GigaDB: promoting data dissemination and reproducibility , 2014, Database J. Biol. Databases Curation.

[60]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[61]  Douglas B. Kell,et al.  Software review: the KNIME workflow environment and its applications in genetic programming and machine learning , 2015, Genetic Programming and Evolvable Machines.

[62]  Robert E. Kearney,et al.  A HUPO test sample study reveals common problems in mass spectrometry-based proteomics , 2009, Nature Methods.

[63]  Brian A. Nosek,et al.  Promoting an open research culture , 2015, Science.

[64]  Stefanie Widder,et al.  The SBML ODE Solver Library: a native API for symbolic and fast numerical analysis of reaction networks , 2006, Bioinform..

[65]  J. Ioannidis,et al.  Reproducibility in Science: Improving the Standard for Basic and Preclinical Research , 2015, Circulation research.

[66]  Jonathan M. Borwein,et al.  Setting the Default to Reproducible Reproducibility in Computational and Experimental Mathematics , 2013 .

[67]  Brian A. Nosek,et al.  An open investigation of the reproducibility of cancer biology research , 2014, eLife.

[68]  John P. A. Ioannidis,et al.  Research: increasing value, reducing waste 2 , 2014 .

[69]  E. García‐Berthou,et al.  Incongruence between test statistics and P values in medical papers , 2004 .

[70]  F. Collins,et al.  NIH plans to enhance reproducibility , 2014 .

[71]  Padraig Gleeson,et al.  The Open Source Brain Initiative: enabling collaborative modelling in computational neuroscience , 2012, BMC Neuroscience.

[72]  Edmund J. Crampin,et al.  Minimum Information About a Simulation Experiment (MIASE) , 2011, PLoS Comput. Biol..

[73]  Yolanda Gil,et al.  Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome , 2013, PloS one.

[74]  Santiago Schnell,et al.  Ten Simple Rules for a Computational Biologist’s Laboratory Notebook , 2015, PLoS Comput. Biol..

[75]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[76]  Gerald Penkler,et al.  Construction and validation of a detailed kinetic model of glycolysis in Plasmodium falciparum , 2015, The FEBS journal.

[77]  Melanie I. Stefan,et al.  BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models , 2010, BMC Systems Biology.

[78]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[79]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[80]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[81]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[82]  Peter J. Hunter,et al.  Bioinformatics Applications Note Databases and Ontologies the Physiome Model Repository 2 , 2022 .

[83]  Hugh D. Spence,et al.  Minimum information requested in the annotation of biochemical models (MIRIAM) , 2005, Nature Biotechnology.

[84]  David L. Donoho,et al.  WaveLab and Reproducible Research , 1995 .

[85]  R. Nuzzo How scientists fool themselves – and how they can stop , 2015, Nature.

[86]  V. Stodden,et al.  Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals , 2013, PloS one.

[87]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[88]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[89]  Sarah M. Keating,et al.  BioModels: Content, Features, Functionality, and Use , 2015, CPT: pharmacometrics & systems pharmacology.

[90]  Nicolas Le Novère,et al.  Model storage, exchange and integration , 2006, BMC Neuroscience.