Our path to better science in less time using open data science tools

Reproducibility has long been a tenet of science but has been challenging to achieve—we learned this the hard way when our old approaches proved inadequate to efficiently reproduce our own work. Here we describe how several free software tools have fundamentally upgraded our approach to collaborative research, making our entire workflow more transparent and streamlined. By describing specific tools and how we incrementally began using them for the Ocean Health Index project, we hope to encourage others in the scientific community to do the same—so we can all produce better science in less time.

[1]  Nick Barnes Publish your computer code: it is good enough , 2010, Nature.

[2]  Monya Baker Scientific computing: Code alert , 2017 .

[3]  Susann Fiedler,et al.  Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency , 2016, PLoS biology.

[4]  William K. Michener,et al.  Ten Simple Rules for Creating a Good Data Management Plan , 2015, PLoS Comput. Biol..

[5]  Scott Chamberlain,et al.  Building Software, Building Community: Lessons from the rOpenSci Project , 2014 .

[6]  William K. Michener,et al.  Common Errors in Ecological Data Sharing , 2013 .

[7]  Erika Check Hayden,et al.  Mozilla plan seeks to debug scientific code , 2013, Nature.

[8]  Matthew B Jones,et al.  Ecoinformatics: supporting ecology as a data-intensive science. , 2012, Trends in ecology & evolution.

[9]  Ian M. Mitchell,et al.  Best Practices for Scientific Computing , 2012, PLoS biology.

[10]  Jason Williams,et al.  Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators , 2017, bioRxiv.

[11]  Karthik Ram,et al.  Git can facilitate greater reproducibility and increased transparency in science , 2013, Source Code for Biology and Medicine.

[12]  Jeffrey Perkel,et al.  Democratic databases: science on GitHub , 2016, Nature.

[13]  Gregory J. Wilson,et al.  Where’s the Real Bottleneck in Scientific Computing? , 2006 .

[14]  Amanda L. Whitmire,et al.  Water, Water, Everywhere: Defining and Assessing Data Sharing in Academia , 2016, PloS one.

[15]  Greg Wilson,et al.  Software Carpentry: lessons learned , 2014, F1000Research.

[16]  Jeff Dozier,et al.  Environmental Informatics , 2012 .

[17]  Shawn Bowers,et al.  The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere , 2006 .

[18]  Lex Nederbragt,et al.  Good enough practices in scientific computing , 2016, PLoS Comput. Biol..

[19]  Chris Hope,et al.  Environmental information for all : The need of a monthly index , 1990 .

[20]  S. Buck,et al.  Solving reproducibility , 2015, Science.

[21]  Ethan P. White,et al.  Nine simple ways to make it easier to (re)use your data , 2013 .

[22]  K. A. S. Mislan,et al.  Elevating the status of code in ecology , 2015, bioRxiv.

[23]  W. Christopher Lenhardt,et al.  The Tao of open science for ecology , 2015 .

[24]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[25]  Catherine Longo,et al.  Mapping Uncertainty Due to Missing Data in the Global Ocean Health Index , 2016, PLoS ONE.

[26]  Brian A. Nosek,et al.  Promoting an open research culture , 2015, Science.

[27]  Kai Blin,et al.  Ten Simple Rules for Taking Advantage of git and GitHub , 2016 .

[28]  Daniel R. Brumbaugh,et al.  An index to assess the health and benefits of the global ocean , 2012, Nature.

[29]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[30]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[31]  James Regetz,et al.  Advances in global change research require open science by individual researchers , 2012 .

[32]  Hadley Wickham,et al.  In-Source Documentation for R , 2015 .

[33]  Brian A. Nosek,et al.  How open science helps researchers succeed , 2016, eLife.

[34]  Matthew B. Jones,et al.  Challenges and Opportunities of Open Data in Ecology , 2011, Science.

[35]  Monya Baker,et al.  Cancer reproducibility project releases first results , 2017, Nature.

[36]  Nicholas P. Tatonetti,et al.  Ten Simple Rules to Enable Multi-site Collaborations through Data Sharing , 2017, PLoS Comput. Biol..

[37]  B. Halpern,et al.  Patterns and Emerging Trends in Global Ocean Health , 2015, PloS one.

[38]  Hadley Wickham,et al.  Tools to Make Developing R Packages Easier , 2016 .

[39]  Monya Baker,et al.  Over half of psychology studies fail reproducibility test , 2015, Nature.

[40]  Kosuke Imai,et al.  mediation: R Package for Causal Mediation Analysis , 2014 .

[41]  Steven H. D. Haddock,et al.  Practical Computing for Biologists , 2010 .

[42]  Carly Strasser,et al.  Data publication consensus and controversies , 2014, F1000Research.

[43]  Paul T. Groth,et al.  Ten Simple Rules for the Care and Feeding of Scientific Data , 2014, PLoS Comput. Biol..

[44]  Elizabeth Gilbert,et al.  Reproducibility Project: Results (Part of symposium called "The Reproducibility Project: Estimating the Reproducibility of Psychological Science") , 2014 .

[45]  Benjamin S. Halpern,et al.  Best practices for assessing ocean health in multiple contexts using tailorable frameworks , 2015, PeerJ.

[46]  Ashley Shade,et al.  Computing Workflows for Biologists: A Roadmap , 2015, PLoS biology.

[47]  Jeffrey M. Perkel,et al.  Scientific writing: the online cooperative , 2014, Nature.

[48]  Stephan Lewandowsky,et al.  Research integrity: Don't let transparency damage science , 2016, Nature.

[49]  A. Budden,et al.  Big data and the future of ecology , 2013 .

[50]  John D. Blischak,et al.  A Quick Introduction to Version Control with Git and GitHub , 2016, PLoS Comput. Biol..

[51]  Monya Baker Why scientists must share their research code , 2016, Nature.

[52]  A. Casadevall,et al.  Reproducible Science , 2010, Infection and Immunity.

[53]  Greg Wilson,et al.  Software Carpentry: Getting Scientists to Write Better Code by Making Them More Productive , 2006, Computing in Science & Engineering.