Connecting Data Publication to the Research Workflow: A Preliminary Analysis

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

[1]  J. Brown,et al.  The Only Sustainable Edge: Why Business Strategy Depends On Productive Friction And Dynamic Specialization , 2005 .

[2]  Les Carr,et al.  Position Paper: Publication at Source: Scientific Communication from a Publication Web to a Data Grid , 2002, EuroWeb.

[3]  Chris Awre,et al.  CLIF: Moving repositories upstream in the content lifecycle , 2012, J. Digit. Inf..

[4]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[5]  J. G. Garrett The Producer-Archive Interface Methodology Abstract Standard (PAIMAS) , 2004 .

[6]  M. S. Avila-Garcia,et al.  From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics , 2015, PloS one.

[7]  Stacy T. Kowalczyk,et al.  Where Does All the Data Go: Quantifying the Final Disposition of Research Data , 2014, ASIST.

[8]  Melissa H. Cragin,et al.  Constructing Data Curation Profiles , 2009, Int. J. Digit. Curation.

[9]  Jeremy G. Frey,et al.  Curation of Laboratory Experimental Data as Part of the Overall Data Lifecycle , 2006, Int. J. Digit. Curation.

[10]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[11]  Martina Stockhause,et al.  Key components of data publishing: using current best practices to develop a reference model for data publishing , 2017, International Journal on Digital Libraries.

[12]  David Charles De Roure,et al.  myExperiment: social networking for workflow-using e-scientists , 2007, WORKS '07.

[13]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[14]  Cristina Ribeiro,et al.  The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment , 2014, iPRES.

[15]  B. Library Patterns of information use and exchange: case studies of researchers in the life sciences , 2009 .

[16]  Peter Fox,et al.  Is Data Publication the Right Metaphor? , 2013, Data Sci. J..

[17]  Peter T. Darch,et al.  Beyond Big or Little Science: Understanding Data Lifecycles in Astronomy and the Deep Subseafloor Biosphere , 2015 .

[18]  Carl Wilson,et al.  A Framework for Distributed Preservation Workflows , 2009, iPRES.

[19]  Martina Stockhause,et al.  Quality assessment concept of the World Data Center for Climate and its application to CMIP5 data , 2012 .

[20]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[21]  Faouzi Kamoun The convergence of business process management and service oriented architecture , 2007, UBIQ.

[22]  Laura Rueda,et al.  A comparative analysis of disciplinary data management workflows , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[23]  Addis Matthew RDM workflows and integrations for HEIs using hosted services , 2015 .

[24]  Adam Farquhar,et al.  Planets: Integrated Services for Digital Preservation , 2007, Int. J. Digit. Curation.

[25]  Matthew S. Mayernik,et al.  Moving Archival Practices Upstream: An Exploration of the Life Cycle of Ecological Sensing Data in Collaborative Field Research , 2008, Int. J. Digit. Curation.

[26]  Veerle Van den Eynden,et al.  Survey of Wellcome researchers and their attitudes to open research , 2016 .

[27]  Paolo Manghi,et al.  Science 2.0 Repositories: Time for a Change in Scholarly Communication , 2015, D Lib Mag..

[28]  Lin Jennifer,et al.  Principles for Open Scholarly Infrastructures-v1 , 2015 .