Using Web Data Provenance for Quality Assessment

The Web of Data cannot be a trustworthy data source unless an approach for evaluating the quality of data on the Web is established and integrated as part of the data publication and access process. In this paper, we propose an approach of using provenance information about the data on theWeb to assess their quality and trustworthiness. Our contributions include a model for Web data provenance and an assessment method that can be adapted for specific quality criteria. We demonstrate how this method can be used to evaluate the timeliness of data on the Web, to reflect how up-to-date the data is. We also propose a possible solution to deal with missing provenance information by associating certainty values with calculated quality values.

[1]  Olaf Hartig Provenance Information in the Web of Data , 2009, LDOW.

[2]  Amihai Motro,et al.  Estimating the Quality of Databases , 1998, FQAS.

[3]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[4]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[5]  Monica Bobrowski,et al.  A Homogeneous Framework to Measure Data Quality , 1999, IQ.

[6]  Christian Bizer,et al.  Quality-Driven Information Filtering- In the Context of Web-Based Information Systems , 2007 .

[7]  Wang Chiew Tan Provenance in Databases: Past, Current, and Future , 2007, IEEE Data Eng. Bull..

[8]  Vladimir Zadorozhny,et al.  Learning response time for WebSources using query feedback and application in query optimization , 2000, The VLDB Journal.

[9]  Paul T. Groth,et al.  Provenance-based validation of e-science experiments , 2005 .

[10]  Felix Naumann,et al.  Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.

[11]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[12]  Jennifer Golbeck,et al.  Using Trust and Provenance for Content Filtering on the Semantic Web , 2006, MTW.

[13]  Luc Moreau,et al.  Validation of E-Science Experiments using a Provenance-based Approach , 2005 .

[14]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[15]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[16]  Pedro R. Falcone Sampaio,et al.  Incorporating the Timeliness Quality Dimension in Internet Query Systems , 2005, WISE Workshops.