Explanations for Non-validation in SHACL

The Shape Constraint Language (SHACL) is a recently standardized language for expressing constraints on RDF graphs. It is the result of industrial and academic efforts to provide solutions for checking the quality of RDF graphs and for declaratively describing (parts of) their structure. We recommend [9] for an introduction to SHACL and its close relative ShEx. Among other, the SHACL standard provides a syntax for writing down constraints, as well as describes the way RDF graphs should be validated w.r.t. to a given set of SHACL constraints. However, some aspects of validation were not completely specified in the standard, like the semantics of validation for constraints with cyclic dependencies. To address these shortcomings, several formalizations of SHACL grounded on logic-based languages with clear semantics have recently emerged [7,2,11]. In SHACL, the basic computational problem is to check whether a given RDF graph G validates a SHACL document (C, T ), where C is a specification of validation rules (constraints) and T is a specification of nodes to which the validation rules should apply (targets). In order to make SHACL truly useful and widely accepted, we need automated tools that implement not only validation, which results in “yes” or “no” answers, but also support the users in their efforts to understand the reasons why a given graph validates or not against a given document. The SHACL specification stresses the importance of explaining validation outcomes and introduces the notion of validation reports for this purpose. If a graph validates a document, the standard has clear guidance how the validation reports should look like. However, the situation is different when the graph does not validate. The principles of validation reports in case of non-validation are left largely open in the standard, which specifies little beyond requiring that the node and constraint violated are indicated. It is not hard to see that, in general, there may be a very large number of possible reasons for a specific validation target to fail, and it is far from obvious what should be presented to the user in validation reports. This is precisely the topic of our study.