The Role of Data Stewardship in Software Sustainability and Reproducibility

Software and computational tools are instrumental for scientific investigation in today's digitized research environment. Despite this crucial role, the path towards implementing best practices to achieve reproducibility and sustainability of research software is challenging. Delft University of Technology has begun recently a novel initiative of data stewardship - disciplinary support for research data management, one of the main aims of which is achieving reproducibility of scientific results in general. In this paper, we aim to explore the potential of data stewardship for supporting software reproducibility and sustainability as well. Recently, we gathered the key stakeholders of the topic (i.e. researchers, research software engineers, and data stewards) in a workshop setting to understand the challenges and barriers, the support required to achieve software sustainability and reproducibility, and how all the three parties can efficiently work together. Based on the insights from the workshop, as well as our professional experience as data stewards, we draw conclusions on possible ways forward to achieve the important goal of software reproducibility and sustainability through coordinated efforts of the key stakeholders.

[1]  James Howison,et al.  Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature , 2016, J. Assoc. Inf. Sci. Technol..

[2]  Lex Nederbragt,et al.  Good enough practices in scientific computing , 2016, PLoS Comput. Biol..

[3]  Elizabeth Gilbert,et al.  Reproducibility Project: Results (Part of symposium called "The Reproducibility Project: Estimating the Reproducibility of Psychological Science") , 2014 .

[4]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[5]  Jan Krause,et al.  Quantitative assessment of research data management practice , 2018 .

[6]  J. Ioannidis,et al.  Reproducibility in Science: Improving the Standard for Basic and Preclinical Research , 2015, Circulation research.

[7]  Alastair Dunning,et al.  Data Stewardship addressing disciplinary data management needs , 2018, Int. J. Digit. Curation.

[8]  Rolf Hut,et al.  Comment on “Most computational hydrology is not reproducible, so is it really science?” by Christopher Hutton et al.: Let hydrologists learn the latest computer science by working with Research Software Engineers (RSEs) and not reinvent the waterwheel ourselves , 2017 .

[9]  Simon Hettrick Research Software Sustainability: Report on a Knowledge Exchange Workshop , 2016 .

[10]  Victoria Stodden,et al.  Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research , 2014 .

[11]  Benedikt V. Ehinger,et al.  Faculty Opinions recommendation of PSYCHOLOGY. Estimating the reproducibility of psychological science. , 2015 .

[12]  David L. Donoho,et al.  WaveLab and Reproducible Research , 1995 .

[13]  Arian Maleki,et al.  Reproducible Research in Computational Harmonic Analysis , 2009, Computing in Science & Engineering.

[14]  Michael Franke,et al.  Recommendations on the development, use and provision of Research Software , 2018 .

[15]  P. K. Doorn,et al.  A Conceptual Approach To Data Stewardship and Software Sustainability : Scientists in charge, with a little help from their friends , 2016 .

[16]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[17]  Daniel S. Katz,et al.  slides: Track 1 Paper: Surveying the U.S. National Postdoctoral Association Regarding Software Use and Training in Research , 2017 .

[18]  Christopher Hutton,et al.  Most computational hydrology is not reproducible, so is it really science? , 2016, Water Resources Research.