A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility

Data makes science possible. Sharing data improves visibility, and makes the research process transparent. This increases trust in the work, and allows for independent reproduction of results. However, a large proportion of data from published research is often only available to the original authors. Despite the obvious benefits of sharing data, and scientists' advocating for the importance of sharing data, most advice on sharing data discusses its broader benefits, rather than the practical considerations of sharing. This paper provides practical, actionable advice on how to actually share data alongside research. The key message is sharing data falls on a continuum, and entering it should come with minimal barriers.

[1]  Dominique Gravel,et al.  Ecological Data Should Not Be So Hard to Find and Reuse. , 2019, Trends in ecology & evolution.

[2]  Davide Castelvecchi,et al.  Google unveils search engine for open data , 2018, Nature.

[3]  Ethan P. White,et al.  Nine simple ways to make it easier to (re)use your data , 2013 .

[4]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[5]  Victoria Stodden,et al.  An empirical analysis of journal policy effectiveness for computational reproducibility , 2018, Proceedings of the National Academy of Sciences.

[6]  Rebecca Kirk,et al.  Supporting data sharing , 2019, npj Breast Cancer.

[7]  Tim Head,et al.  Binder 2.0 - Reproducible, interactive, sharable environments for science at scale , 2018, SciPy.

[8]  Dirk Eddelbuettel,et al.  Hosting Data Packages via drat: A Case Study with Hurricane Exposure Data , 2017, R J..

[9]  Robert Gentleman,et al.  Statistical Analyses and Reproducible Research , 2007 .

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[12]  D. Donoho 50 Years of Data Science , 2017 .

[13]  Karthik Ram,et al.  Git can facilitate greater reproducibility and increased transparency in science , 2013, Source Code for Biology and Medicine.

[14]  Gabriel Popkin,et al.  Data sharing and how it can benefit your scientific career , 2019, Nature.

[15]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[16]  Brian A. Nosek,et al.  How open science helps researchers succeed , 2016, eLife.

[17]  Aniruddha R. Thakar,et al.  Sloan Digital Sky Survey IV: Mapping the Milky Way, Nearby Galaxies, and the Distant Universe , 2017, 1703.00052.

[18]  Jeffrey T. Leek,et al.  How to Share Data for Collaboration , 2018, The American statistician.

[19]  B. Andrews,et al.  Marvin: A Tool Kit for Streamlined Access and Visualization of the SDSS-IV MaNGA Data Set , 2018, The Astronomical Journal.

[20]  Paul Walsh,et al.  Frictionless Data: Making Research Data Quality Visible , 2018, Int. J. Digit. Curation.

[21]  Anisa Rowhani-Farid,et al.  Has open data arrived at the British Medical Journal (BMJ)? An observational study , 2016, BMJ Open.

[22]  Carl Boettiger ropensci/arkdb: arkdb: Archive and Unarchive Databases Using Flat Files , 2018 .

[23]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.

[24]  Kara H. Woo,et al.  Data Organization in Spreadsheets , 2018 .

[25]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[26]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[27]  W. Michener,et al.  Ecological Informatics : a Long-Term Ecological Research Perspective , 2010 .