On the detection of inconsistencies in RDF data sets and their correction at ontological level

The rapid development of Linked Data leads to a proliferation of errors in pub- lished data, primarily related to inconsistencies between data instances and their related ontologies. This problem alters the reliability of Semantic Web applications when they involve the analysis or the exploitation of heterogeneous rdf data sets. We focus in this article on a way to correct inconsistencies caused by the domain and the range of a property. We present an algorithm to identify the source of these inconsistencies in the ontology, and to provides guidelines to correct or improve the ontology. The localization of the inconsistencies is based on a quantitative comparison between the classes of domains and ranges defined in the ontology and the ones built by the exhaustive analysis of instances used as subject or object in the properties. We show the usefulness of this method on a case study involving dbpedia: we use our approach to diagnose and correct common inconsistencies. Another experiment conducted on a large data set generated by sp2bench validates the scalability of the proposed algorithm.

[1]  Siu Cheung Hui,et al.  A Fuzzy FCA-based Approach to Conceptual Clustering for Automatic Generation of Concept Hierarchy on Uncertainty Data , 2004, CLA.

[2]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[3]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[4]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[5]  Asunción Gómez-Pérez,et al.  Evaluation of ontologies , 2001, International Journal of Intelligent Systems.

[6]  Bijan Parsia,et al.  Debugging OWL ontologies , 2005, WWW '05.

[7]  Frank van Harmelen,et al.  Scalable Distributed Reasoning Using MapReduce , 2009, SEMWEB.

[8]  Andreas Harth,et al.  Scalable Authoritative OWL Reasoning for the Web , 2009, Int. J. Semantic Web Inf. Syst..

[9]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Aidan Hogan,et al.  ReConRank: A Scalable Ranking Method for Semantic Web Data with Context , 2006 .

[11]  Mieczyslaw M. Kokar,et al.  Towards a Symptom Ontology for Semantic Web Applications , 2004, SEMWEB.

[12]  Jiao Tao,et al.  Instance Data Evaluation for Semantic Web-Based Knowledge Management Systems , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[13]  James A. Hendler,et al.  Debugging unsatisfiable classes in OWL ontologies , 2005, J. Web Semant..

[14]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[15]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.