Magic Shapes for Validation in SHACL

The Shape Constraint Language (SHACL) was recently standardized by the W3C as a formalism for checking the quality of RDF graphs; we refer to [7] for an introduction. In SHACL, the main problem is to check whether a given RDF graph G validates a SHACL document (C, T ), where C is a set of constraints, also called shapes graph, each associated to a so-called shape name, and T (targets) is a specification of nodes from the data graph which should validate certain shapes from C. For illustration, consider a graph G = {enrolledIn(Ben,C1)} and a SHACL document (C, T ), where C = {Student ↔ ∃enrolledIn.Course}, and T is the shape atom Student(Ben). The constraint states that each Student must be enrolled in some course; Student is a shape name, and enrolledIn and Course are data predicates, i.e., property and class name, respectively. Clearly, G does not validate (C, T ), but the extended graph G′ = G∪{Course(C1)} does. The standard specifies a syntax for expressing SHACL constraints and describes when they are validated by RDF graphs. However, it leaves undefined the semantics of recursive constraints, i.e., constraints that involve cyclic dependencies. To address this, some logic-based proposals to formalize the semantics of full SHACL have emerged recently. Andresel et al. [1] proposed a semantics based on the stable models semantics for logic programs, stricter than the semantics based on classical logic due to Corman el al. [4]. Both semantics coincide with the official recommendation for non-recursive constraints, and unfortunately, the validation problem is NP-complete under both. To make SHACL truly useful and facilitate its adoption, we need automated tools that efficiently implement validation and scale well in the presence of large RDF graphs and sets of constraints. There are already significant efforts in this direction for fragments of SHACL [3, 6]. Shacl2Sparql [3] is a SHACL validation engine that checks conformance of RDF graphs with SHACL constraints by evaluating SPARQL queries against the data, which optimzes the order in which shapes are processed. Further optimization techniques are implemented in Trav-Shacl [6]. However, these works focus on tractable fragments of SHACL. They do not handle unrestricted interaction of recursion and negation in SHACL constraints, which calls for verifying whether there exists some global assignment of shapes to nodes in the graph that is consistent with all the constraints. This cannot be easily done using top-down approaches implemented in existing validators. To our knowledge, the only implementation that validates full SHACL