Key components of data publishing: using current best practices to develop a reference model for data publishing

The availability of workflows for data publishing could have an enormous impact on researchers, research practices and publishing paradigms, as well as on funding strategies and career and research evaluations. We present the generic components of such workflows to provide a reference model for these stakeholders. The RDA-WDS Data Publishing Workflows group set out to study the current data-publishing workflow landscape across disciplines and institutions. A diverse set of workflows were examined to identify common components and standard practices, including basic self-publishing services, institutional data repositories, long-term projects, curated data repositories, and joint data journal and repository arrangements. The results of this examination have been used to derive a data-publishing reference model comprising generic components. From an assessment of the current data-publishing landscape, we highlight important gaps and challenges to consider, especially when dealing with more complex workflows and their integration into wider community frameworks. It is clear that the data-publishing landscape is varied and dynamic and that there are important gaps and challenges. The different components of a data-publishing system need to work, to the greatest extent possible, in a seamless and integrated way to support the evolution of commonly understood and utilized standards and—eventually—to increased reproducibility. We therefore advocate the implementation of existing standards for repositories and all parts of the data-publishing process, and the development of new standards where necessary. Effective and trustworthy data publishing should be embedded in documented workflows. As more research communities seek to publish the data associated with their research, they can build on one or more of the components identified in this reference model.

[1]  Sarah Callaghan,et al.  Processes and Procedures for Data Publication: A Case Study in the Geosciences , 2013, Int. J. Digit. Curation.

[2]  Peter Fox,et al.  Is Data Publication the Right Metaphor? , 2013, Data Sci. J..

[3]  A. Schwope,et al.  The XMM-Newton serendipitous survey , 2018, Astronomy & Astrophysics.

[4]  Peter Webster,et al.  Research Data Repositories: Review of Current Features, Gap Analysis, and Recommendations for Minimum Requirements , 2016 .

[5]  Matthew H. Brush,et al.  The Resource Identification Initiative: A cultural shift in publishing , 2015, F1000Research.

[6]  Linda S. Birnbaum,et al.  Intersection of Systematic Review Methodology with the NIH Reproducibility Initiative , 2014, Environmental health perspectives.

[7]  R. Yin Case Study Research: Design and Methods , 1984 .

[8]  Graham Pryor,et al.  Multi-scale Data Sharing in the Life Sciences: Some Lessons for Policy Makers , 2009, Int. J. Digit. Curation.

[9]  Brenda Rashleigh,et al.  Raising the Bar for Reproducible Science at the U.S. Environmental Protection Agency Office of Research and Development , 2015, Toxicological sciences : an official journal of the Society of Toxicology.

[10]  Resource Identification Initiative Members The Resource Identification Initiative: A cultural shift in publishing , 2015 .

[11]  Ruth E. Duerr,et al.  Achieving human and machine accessibility of cited data in scholarly publications , 2015, PeerJ Comput. Sci..

[12]  Martina Stockhause,et al.  WDS-RDA-F11 Publishing Data Workflows WG Synthesis FINAL CORRECTED , 2015 .

[13]  J. Greenberg Big Data, Little Data, No Data: Scholarship in the Networked World , 2016 .

[14]  Angus Whyte,et al.  Making the Case for Research Data Management , 2011 .

[15]  Sarah Callaghan,et al.  Citation and Peer Review of Data: Moving Towards Formal Data Publication , 2011, Int. J. Digit. Curation.

[16]  Jared Lyle,et al.  The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data , 2010, iPRES.

[17]  Florence Debarre,et al.  The Availability of Research Data Declines Rapidly with Article Age , 2013, Current Biology.

[18]  Tanmoy Chakraborty,et al.  PubIndia: A Framework for Analyzing Indian Research Publications in Computer Science , 2015 .

[19]  Martina Stockhause,et al.  Quality assessment concept of the World Data Center for Climate and its application to CMIP5 data , 2012 .

[20]  S. Rijcke,et al.  Bibliometrics: The Leiden Manifesto for research metrics , 2015, Nature.

[21]  Matthew S. Mayernik,et al.  Peer Review of Datasets: When, Why, and How , 2015 .

[22]  A. Treloar,et al.  Open Data in Global Environmental Research: The Belmont Forum’s Open Data Survey , 2016, PloS one.

[23]  C. Borgman,et al.  If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology , 2013, PloS one.

[24]  Heather A. Piwowar,et al.  Data reuse and the open data citation advantage , 2013, PeerJ.

[25]  Irina Sens,et al.  The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite , 2015, D Lib Mag..

[26]  Michael Witt,et al.  Data sharing, small science and institutional repositories , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[27]  Christine L. Borgman,et al.  Big Data, Little Data, No Data: Scholarship in the Networked World , 2014 .

[28]  Eleni Castro,et al.  Building a Bridge Between Journal Articles and Research Data: The PKP-Dataverse Integration Project , 2014, Int. J. Digit. Curation.

[29]  S. Barnett,et al.  Philosophical Transactions of the Royal Society A : Mathematical , 2017 .

[30]  I. M. Stewart,et al.  The XMM-Newton serendipitous survey. V. The Second XMM-Newton serendipitous source catalogue , 2008, 0807.1067.

[31]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[32]  Jonathan M. Borwein,et al.  Setting the Default to Reproducible Reproducibility in Computational and Experimental Mathematics , 2013 .

[33]  C. Rusbridge,et al.  The International Journal of Digital Curation , 2008 .

[34]  David N. Kennedy,et al.  The Resource Identification Initiative: A cultural shift in publishing , 2015, Neuroinformatics.

[35]  R. Moss,et al.  Climate model intercomparisons: Preparing for the next phase , 2014 .