LOP: capturing and linking open provenance on LOD cycle

The Web of Data has emerged as a means to expose, share, reuse, and connect information on the Web identified by URIs using RDF as a data model, following Linked Data Principles. However, the reuse of third party data can be compromised without proper data quality assessments. In this context, important questions emerge: how can one trust on published data and links? Which manipulation, modification and integration operations have been applied to the data before its publication? What is the nature of comparisons or transformations applied to data during the interlinking process? In this scenario, provenance becomes a fundamental element. In this paper, we describe an approach for generating and capturing Linked Open Provenance (LOP) to support data quality and trustworthiness assessments, which covers preparation and format transformation of traditional data sources, up to dataset publication and interlinking. The proposed architecture takes advantage of provenance agents, orchestrated by an ETL workflow approach, to collect provenance at any specified level and also link it with its corresponding data. We also describe a real use case scenario where the architecture was implemented to evaluate the proposal.

[1]  Robert Isele,et al.  Silk Server - Adding missing Links while consuming Linked Data , 2010, COLD.

[2]  Divesh Srivastava,et al.  Intensional associations between data and metadata , 2007, SIGMOD '07.

[3]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[4]  Maria Cláudia Reis Cavalcanti,et al.  Registro de Procedência de Ligações RDF em Dados Ligados , 2012, ONTOBRAS-MOST.

[5]  Nigel Shadbolt,et al.  Capturing Interactive Data Transformation Operations using Provenance Workflows , 2012, SWPM@ESWC.

[6]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[7]  Utpal Biswas,et al.  Provenance Representation and Storage Techniques in Linked Data: A State-of-the-Art Survey , 2012 .

[8]  Roland Bouman,et al.  Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration , 2010 .

[9]  Olaf Hartig Provenance Information in the Web of Data , 2009, LDOW.

[10]  Landong Zuo,et al.  Tracing the provenance of linked data using voiD , 2011, WIMS '11.

[11]  Giancarlo Guizzardi,et al.  An approach for managing and semantically enriching the publication of linked open governmental data , 2011 .

[12]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[13]  François Scharffe,et al.  Data Linking for the Semantic Web , 2011, Int. J. Semantic Web Inf. Syst..

[14]  Marta Mattoso,et al.  Towards a Taxonomy of Provenance in Scientific Workflow Management Systems , 2009, 2009 Congress on Services - I.

[15]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[16]  Amit P. Sheth,et al.  Semantic Provenance for eScience: Managing the Deluge of Scientific Data , 2008, IEEE Internet Computing.

[17]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[18]  Jakub Simko,et al.  Data linking for the Semantic Web , 2015 .

[19]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[20]  Peter Buneman,et al.  Data provenance – the foundation of data quality , 2010 .

[21]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[22]  Jens Lehmann,et al.  Managing the Life-Cycle of Linked Data with the LOD2 Stack , 2012, SEMWEB.

[23]  Jeremy J. Carroll,et al.  Named graphs , 2005, J. Web Semant..

[24]  Robert Isele,et al.  LDIF - Linked Data Integration Framework , 2011, COLD.

[25]  Ralph Kimball,et al.  The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data , 2004 .

[26]  Marc Ehrig,et al.  Ontology Alignment: Bridging the Semantic Gap , 2006 .

[27]  Benedikt Kämpgen,et al.  Representing Interoperable Provenance Descriptions for ETL Workflows , 2012, SWPM@ESWC.

[28]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[29]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..