CED2AR: The Comprehensive Extensible Data Documentation and Access Repository

We describe the design, implementation, and deployment of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR). This is a metadata repository system that allows researchers to search, browse, access, and cite confidential data and metadata through either a web-based user interface or programmatically through a search API, all the while re-reusing and linking to existing archive and provider generated metadata. CED2AR is distinguished from other metadata repository-based applications due to requirements that derive from its social science context. These include the need to cloak confidential data and metadata and manage complex provenance chains.

[1]  Mark Hahnel Exclusive: figshare a new open data project that wants to change the future of scholarly publishing , 2012 .

[2]  Hollie White,et al.  A Metadata Best Practice for a Scientific Data Repository , 2009 .

[3]  Laura Voshell Zayatz,et al.  Using noise for disclosure limi-tation of establishment tabular data , 1998 .

[4]  Inna Kouper,et al.  The SEAD DataNet prototype: data preservation services for sustainability science , 2013, JCDL '13.

[5]  Judy H Jeng Digital Public Library of America , 2014 .

[6]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[7]  Carl Lagoze,et al.  Keeping Dublin Core Simple: Cross-Domain Discovery or Resource Description? , 2001, D-Lib Magazine.

[8]  Daniel E. Geer,et al.  Provenance , 2016, IEEE Secur. Priv..

[9]  Joachim Wackerow,et al.  Using RDF to describe and link social science data to related resources on the Web: leveraging the Data Documentation Initiative (DDI) model , 2012 .

[10]  Roger Barga,et al.  Automatic capture and efficient storage of e-Science experiment provenance , 2008 .

[11]  Ruth E. Duerr,et al.  On the utility of identification schemes for digital earth science data: an assessment and recommendations , 2011, Earth Sci. Informatics.

[12]  Joachim Wackerow,et al.  Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences , 2012, Dublin Core Conference.

[13]  Mercè Crosas,et al.  The Dataverse Network®: An Open-Source Application for Sharing, Discovering and Preserving Data , 2011, D Lib Mag..

[14]  James Cheney,et al.  PROV-N: The Provenance Notation , 2013 .

[15]  Simone Sacchi,et al.  Definitions of dataset in the scientific and technical literature , 2010, ASIST.

[16]  J. Max Wilkinson,et al.  Making Datasets Visible and Accessible: DataCite's First Summer Meeting , 2010 .

[17]  Luc Moreau,et al.  PROV-XML: The PROV XML Schema , 2013 .

[18]  Ron S. Jarmin,et al.  Business Dynamics Statistics: An Overview , 2009 .

[19]  Carl Lagoze,et al.  Core services in the architecture of the national science digital library (NSDL) , 2002, JCDL '02.

[20]  Carl Lagoze,et al.  Encoding Provenance Metadata for Social Science Datasets , 2013, MTSR.

[21]  Clem Guthro Digital Public Library of America , 2013 .

[22]  Carl Lagoze,et al.  Data Management of Confidential Data , 2013, Int. J. Digit. Curation.

[23]  Ann Zimmerman,et al.  New Knowledge from Old Data , 2008 .

[24]  Carl Lagoze,et al.  Encoding Provenance of Social Science Data: Integrating PROV with DDI , 2013 .

[25]  G. Alter Response to RFI: 'Public Access to Digital Data Resulting From Federally Funded Scientific Research' Office of Science and Technology Policy , 2012 .

[26]  Joachim Wackerow,et al.  DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data , 2013, LDOW.

[27]  James Frew,et al.  Automatic Provenance Collection and Publishing in a Science Data Production Environment - Early Results , 2010, IPAW.

[28]  Lars Vilhuber,et al.  A Proposed Solution to the Archiving and Curation of Confidential Scientific Inputs , 2012, Privacy in Statistical Databases.

[29]  G. King,et al.  Ensuring the Data-Rich Future of the Social Sciences , 2011, Science.

[30]  Andrew E. Treloar Design and Implementation of the Australian National Data Service , 2009, Int. J. Digit. Curation.

[31]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[32]  Kat Hagedorn OAIster: a “no dead ends” OAI service provider , 2003 .

[33]  James Cheney,et al.  The W3C PROV family of specifications for modelling provenance metadata , 2013, EDBT '13.

[34]  Luc Moreau,et al.  Linking Across Provenance Bundles , 2013 .

[35]  Ruth E. Duerr,et al.  The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation , 2012, D Lib Mag..

[36]  Pascal Heus,et al.  Data Documentation Initiative: Toward a Standard for the Social Sciences , 2008, Int. J. Digit. Curation.

[37]  Jerome P. Reiter,et al.  Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database , 2011 .

[38]  John Kunze,et al.  DataONE: Data Observation Network for Earth - Preserving Data and Enabling Innovation in the Biological and Environmental Sciences , 2011, D Lib Mag..

[39]  Luc Moreau,et al.  PROV-Overview. An Overview of the PROV Family of Documents , 2013 .

[40]  Carl Lagoze,et al.  Core Services in the Architecture of the National Digital Library for Science Education (NSDL) , 2002, ArXiv.

[41]  Jerome P. McDonough,et al.  Structural metadata and the social limitation of interoperability: A sociotechnical view of XML and digital library standards development , 2008 .