Data Conservancy Provenance, Context, and Lineage Services: Key Components for Data Preservation and Curation

Among the key services that institutional data management infrastructures must provide are provenance and lineage tracking and the ability to associate data with contextual information needed for understanding and use. These functionalities are critical for addressing a number of key issues faced by data collectors and users, including trust in data, results traceability, data transparency, and data citation support. In this paper, we describe the support for these services within the Data Conservancy Service (DCS) software. The DCS provenance, context, and lineage services cross the four layers in the DCS data curation stack model: storage, archiving, preservation, and curation.

[1]  J. Tait,et al.  Challenges and opportunities. , 1996, Journal of psychiatric and mental health nursing.

[2]  Alexander S. Szalay,et al.  Online scientific data curation, publication, and archiving , 2002, SPIE Astronomical Telescopes + Instrumentation.

[3]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[4]  Luc Moreau,et al.  PROV-Overview. An Overview of the PROV Family of Documents , 2013 .

[5]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[6]  Paul T. Groth,et al.  Requirements for Provenance on the Web , 2012, Int. J. Digit. Curation.

[7]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[8]  Paul T. Groth,et al.  The Requirements of Using Provenance in e-Science Experiments , 2007, Journal of Grid Computing.

[9]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[10]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[11]  Paul T. Groth,et al.  Provenance XG Final Report , 2010 .

[12]  J. Overpeck,et al.  Climate Data Challenges in the 21st Century , 2011, Science.

[13]  John Garrett,et al.  Preserving Digital Information. Report of the Task Force on Archiving of Digital Information. , 1996 .

[14]  Mark John Costello Motivating Online Publication of Data , 2009 .

[15]  Deborah L. McGuinness,et al.  Reflections on Provenance Ontology Encodings , 2010, IPAW.

[16]  Martin Pilgram,et al.  Consultative Committee For Space Data Systems , 2009 .

[17]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[18]  Yelena Yesha,et al.  Tracking provenance of earth science data , 2010, Earth Sci. Informatics.

[19]  Paul T. Groth,et al.  Recording Process Documentation for Provenance , 2009, IEEE Transactions on Parallel and Distributed Systems.

[20]  William K. Michener,et al.  NONGEOSPATIAL METADATA FOR THE ECOLOGICAL SCIENCES , 1997 .

[21]  J. Holdren Memorandum for the Heads of Executive Departments and Agencies: Increasing Access to the Results of Federally Funded Scientific Research , 2013 .

[22]  Ruth E. Duerr,et al.  The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation , 2012, D Lib Mag..

[23]  James Regetz,et al.  Advances in global change research require open science by individual researchers , 2012 .

[24]  Bernhard Seeger,et al.  The user's view on biodiversity data sharing - Investigating facts of acceptance and requirements to realize a sustainable use of research data - , 2012, Ecol. Informatics.