Supervised tracking and correction of inconsistencies in RDF data

The rapid development of the Linked Data is hampered by the increase of errors in published data, mainly related to inconsistencies with domain ontologies. This problem alter reliability of application when analysis of independent and heterogeneous rdf data is involved. Therefore, it is important to improve the quality of the published data to promote the development of Semantic Web. We introduce a process designed to detect and identify specific inconsistencies in published data, and to fix them at an ontological level. We propose a method to quantitatively evaluate domain and range issues of a property. A diagnosis is then established and used to drive a supervised process for the correction (or possibly enhancement) of the ontology. We show the interest of this method on a case study involving dbpedia. The results of experiments carried on very large data sets generated with rdf sp2bench benchmark validate the applicability and scalability of our method.