Encoding Provenance Metadata for Social Science Datasets

Recording provenance is a key requirement for data-centric scholarship, allowing researchers to evaluate the integrity of source data sets and reproduce, and thereby, validate results. Provenance has become even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. Recent work by the W3C on the PROV model provides the foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We apply that model to complex, but characteristic, provenance examples of social science data, describe scenarios that make scholarly use of those provenance descriptions, and propose a manner for encoding this provenance metadata within the widely-used DDI metadata standard.

[1]  Carl Lagoze,et al.  Data Management of Confidential Data , 2013, Int. J. Digit. Curation.

[2]  Luc Moreau,et al.  PROV-Overview. An Overview of the PROV Family of Documents , 2013 .

[3]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[4]  Joachim Wackerow,et al.  Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences , 2012, Dublin Core Conference.

[5]  James Cheney,et al.  PROV-N: The Provenance Notation , 2013 .

[6]  Deborah L. McGuinness,et al.  PROV-O: The PROV Ontology , 2013 .

[7]  Paul T. Groth,et al.  Requirements for Provenance on the Web , 2012, Int. J. Digit. Curation.

[8]  Pascal Heus,et al.  Data Documentation Initiative: Toward a Standard for the Social Sciences , 2008, Int. J. Digit. Curation.

[9]  Joachim Wackerow,et al.  DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data , 2013, LDOW.

[10]  Margo I. Seltzer,et al.  Provenance: a future history , 2009, OOPSLA Companion.

[11]  Penny J Johnes,et al.  AGU Fall Meeting Abstracts , 2013 .

[12]  Joachim Wackerow,et al.  Using RDF to describe and link social science data to related resources on the Web: leveraging the Data Documentation Initiative (DDI) model , 2012 .

[13]  G. Eden Developing an e−Infrastructure for Social Science. , 2007 .

[14]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[15]  Ron S. Jarmin,et al.  The Longitudinal Business Database , 2002 .