Evaluating the Gap Between an RDF Dataset and Its Schema

An increasing number of linked datasets is published on the Web, using RDF(S)/OWL. The availability of the schema describing these datasets is crucial for their meaningful usage. A dataset may contain schema-related information, however, languages do not impose any constraint on their structure, and a gap may therefore exist between the schema and the actual instances. In this paper, we tackle the problem of evaluating this gap. We present an approach relying on both type and class profiles, as well as a set of quality metrics. We also present some experimental evaluations to illustrate the use of the proposed metrics.

[1]  Jens Lehmann,et al.  TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data , 2013, KESW.

[2]  Martin Hepp,et al.  Using Semantic Web Resources for Data Quality Management , 2010, EKAW.

[3]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[4]  Samira Si-Said Cherfi,et al.  Assessment and analysis of information quality: a multidimensional model and case studies , 2011, Int. J. Inf. Qual..

[5]  Daniel L. Moody,et al.  Theoretical and practical issues in evaluating the quality of conceptual models: current state and future directions , 2005, Data Knowl. Eng..

[6]  Kenza Kellou-Menouer,et al.  Schema Discovery in RDF Data Sources , 2015, ER.

[7]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[8]  Marcelo Arenas,et al.  A Principled Approach to Bridging the Gap between Graph Data and their Schemas , 2014, Proc. VLDB Endow..

[9]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[10]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[11]  Kenza Kellou-Menouer,et al.  Discovering Types in RDF Datasets , 2015, ESWC.

[12]  Martin Hepp,et al.  Swiqa - a semantic web information quality assessment framework , 2011, ECIS.

[13]  Martin Hepp,et al.  Using SPARQL and SPIN for Data Quality Management on the Semantic Web , 2010, BIS.

[14]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[15]  Martin Hepp,et al.  Towards a vocabulary for data quality management in semantic web architectures , 2011, LWDM '11.