Validating RDF Data Quality Using Constraints to Direct the Development of Constraint Languages

For research institutes, data libraries, and data archives, RDF data validation according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in the DCMI RDF Application Profiles Task Group and in cooperation with the W3C Data Shapes Working Group, we identified and published by today 81 types of constraints that are required by various stakeholders for data applications. In this paper, in collaboration with several domain experts we formulate 115 constraints on three different vocabularies (DDI-RDF, QB, and SKOS) and classify them according to (1) the severity of an occurring violation and (2) the complexity of the constraint expression in common constraint languages. We evaluate the data quality of 15,694 data sets (4.26 billion triples) of research data for the social, behavioral, and economic sciences obtained from 33 SPARQL endpoints. Based on the results, we formulate several findings to direct the further development of constraint languages.

[1]  Kai Eckert,et al.  Towards Description Set Profiles for RDF using SPARQL as Intermediate Language , 2014, Dublin Core Conference.

[2]  Richard Cyganiak,et al.  Semantic Statistics: Bringing Together SDMX and SCOVO , 2010, LDOW.

[3]  Jeremy J. Carroll,et al.  OWL 2 Web Ontology Language RDF-Based Semantics , 2009 .

[4]  Ian Horrocks,et al.  Keys, Nominals, and Concrete Domains , 2003, IJCAI.

[5]  Boris Motik,et al.  Bridging the gap between OWL and relational databases , 2007, WWW '07.

[6]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[7]  Joachim Wackerow,et al.  Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences , 2015 .

[8]  Peter F. Patel-Schneider,et al.  Using Description Logics for RDF Constraint Checking and Closed-World Recognition , 2014, AAAI.

[9]  Antoine Isaac,et al.  Finding Quality Issues in SKOS Vocabularies , 2012, TPDL.

[10]  Joachim Wackerow,et al.  Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences , 2015 .

[11]  Jiao Tao,et al.  Integrity Constraints in OWL , 2010, AAAI.

[12]  Kai Eckert,et al.  Requirements on RDF Constraint Formulation and Validation , 2014, Dublin Core Conference.

[13]  Kai Eckert,et al.  RDF Validation Requirements - Evaluation and Logical Underpinning , 2015, ArXiv.

[14]  Boris Motik,et al.  Bridging the gap between OWL and relational databases , 2009, J. Web Semant..

[15]  Boris Motik,et al.  Adding Integrity Constraints to OWL , 2007, OWLED.