Linked Data Provenance: State of the Art and Challenges

Linked Open Data (LOD) is rapidly emerging in publishing and sharing structured data over the semantic web using URIs and RDF in many application domains such as fisheries, health, environment, education and agriculture. Since different schemas that have the same semantics are found in different datasets of the LOD Cloud, the problem of managing semantic heterogeneity among the schemas is increasing. Schema level mapping among the datasets of the LOD Cloud is necessary as instance level mapping among the datasets is not feasible in the process of making knowledge discovery easy and systematic. In order to correctly interpret query results over the integrated dataset, schema level mapping provenance is necessary. In this paper, we review existing approaches of linked data provenance representation, storage and querying, and applications of linked data provenance where mapping is at the instance level. The analysis of existing approaches will assist us in revealing open research problems in the area of linked data provenance where mapping is at the schema level. Furthermore, we explain how schema level mapping provenance in linked data can be used to facilitate data integration and data mining, and also to ensure quality and trust in data.

[1]  Wang Chiew Tan Provenance in Databases: Past, Current, and Future , 2007, IEEE Data Eng. Bull..

[2]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[3]  O. Hartig Trustworthiness of Data on the Web , 2008 .

[4]  Tom Heath,et al.  How to Publish Linked Data on the Web - Proposal for a Half-day Tutorial at ISWC2008 , 2008 .

[5]  Nigel Shadbolt,et al.  Provenance in Linked Data Integration , 2010, LDSI@FIA.

[6]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[7]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[8]  Amit P. Sheth,et al.  Ontology-Driven Provenance Management in eScience: An Application in Parasite Research , 2009, OTM Conferences.

[9]  Jerry R. Hobbs,et al.  An ontology of time for the semantic web , 2004, TALIP.

[10]  Jens Lehmann,et al.  LinkedGeoData: Adding a Spatial Dimension to the Web of Data , 2009, SEMWEB.

[11]  Christian Bizer,et al.  Evolving the Web into a Global Data Space , 2011, BNCOD.

[12]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[13]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[14]  David M. Shotton,et al.  Linked data and provenance in biological data webs , 2009, Briefings Bioinform..

[15]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[16]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[17]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[18]  Olaf Hartig,et al.  Using Web Data Provenance for Quality Assessment , 2009, SWPM.

[19]  Amit P. Sheth,et al.  Ontology Alignment for Linked Open Data , 2010, SEMWEB.

[20]  Michael Hausenblas,et al.  Describing Linked Datasets , 2009, LDOW.

[21]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[22]  Amit P. Sheth,et al.  Provenance Aware Linked Sensor Data , 2010 .

[23]  Amit P. Sheth,et al.  Linked Data Is Merely More Data , 2010, AAAI Spring Symposium: Linked Data Meets Artificial Intelligence.

[24]  Olaf Hartig,et al.  Querying Trust in RDF Data with tSPARQL , 2009, ESWC.

[25]  John Mylopoulos,et al.  Representing and querying data transformations , 2005, 21st International Conference on Data Engineering (ICDE'05).

[26]  Ryutaro Ichise,et al.  LiDDM: A Data Mining System for Linked Data , 2011, LDOW.

[27]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[28]  Yolanda Gil,et al.  A survey of trust in computer science and the Semantic Web , 2007, J. Web Semant..

[29]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[30]  Landong Zuo,et al.  Tracing the provenance of linked data using voiD , 2011, WIMS '11.

[31]  Gustavo Alonso,et al.  Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[32]  Olaf Hartig,et al.  Automatic Integration of Metadata into the Web of Linked Data , 2010 .

[33]  Dan Brickley,et al.  SKOS Core: Simple knowledge organisation for the Web , 2005, Dublin Core Conference.

[34]  Vassilis Christophides,et al.  On Provenance of Queries on Semantic Web Data , 2011, IEEE Internet Computing.

[35]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[36]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[37]  Christian Bizer,et al.  The R2R Framework: Publishing and Discovering Mappings on the Web , 2010, COLD.

[38]  Olaf Hartig,et al.  Publishing and Consuming Provenance Metadata on the Web of Linked Data , 2010, IPAW.