Using SPARQL and SPIN for Data Quality Management on the Semantic Web

The quality of data is a key factor that determines the performance of information systems, in particular with regard (1) to the amount of exceptions in the execution of business processes and (2) to the quality of decisions based on the output of the respective information system. Recently, the Semantic Web and Linked Data activities have started to provide substantial data resources that may be used for real business operations. Hence, it will soon be critical to manage the quality of such data. Unfortunately, we can observe a wide range of data quality problems in Semantic Web data. In this paper, we (1) evaluate how the state of the art in data quality research fits the characteristics of the Web of Data, (2) describe how the SPARQL query language and the SPARQL Inferencing Notation (SPIN) can be utilized to identify data quality problems in Semantic Web data automatically and this within the Semantic Web technology stack, and (3) evaluate our approach.

[1]  Pedro Rangel Henriques,et al.  A Formal Definition of Data Quality Problems , 2005, ICIQ.

[2]  Thomas Redman,et al.  The impact of poor data quality on the typical enterprise , 1998, CACM.

[3]  Felix Naumann,et al.  Informationsintegration - Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen , 2006 .

[4]  A. J. Valsangkar,et al.  Principles, methods and applications of particle size analysis , 1992 .

[5]  Zoubida Kedad,et al.  Ontology-Based Data Cleaning , 2002, NLDB.

[6]  Thomas C. Redman,et al.  Data Quality: The Field Guide , 2001 .

[7]  Martin Hepp,et al.  GoodRelations: An Ontology for Describing Products and Services Offers on the Web , 2008, EKAW.

[8]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[9]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[10]  Olaf Hartig Provenance Information in the Web of Data , 2009, LDOW.

[11]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[12]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[13]  Jos de Bruijn,et al.  Information Integration with Ontologies: Experiences from an Industrial Showcase , 2005 .

[14]  Steffen Stadtmüller,et al.  RaDON - Repair and Diagnosis in Ontology Networks , 2009, ESWC.

[15]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[16]  Jack E. Olson,et al.  Data Quality: The Accuracy Dimension , 2003 .

[17]  Juan Sequeda,et al.  How to consume linked data on the web: tutorial description , 2010, WWW '10.

[18]  Howard J. Hamilton,et al.  An Ontology-Based Approach to Data Cleaning , 2005 .

[19]  Helena Galhardas,et al.  A Taxonomy of Data Quality Problems , 2005 .

[20]  Aldo Gangemi,et al.  Knowledge Engineering: Practice and Patterns, 16th International Conference, EKAW 2008, Acitrezza, Italy, September 29 - October 2, 2008. Proceedings , 2008, EKAW.

[21]  Olaf Hartig,et al.  Querying Trust in RDF Data with tSPARQL , 2009, ESWC.

[22]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[23]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.