Validating RDF with Shape Expressions

We propose shape expression schema (ShEx), a novel schema formalism for describing the topology of an RDF graph that uses regular bag expressions (RBEs) to define constraints on the admissible neighborhood for the nodes of a given type. We provide two alternative semantics, multi- and single-type, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the single-type semantics is strictly more expressive than the multi-type semantics, single-type validation is generally intractable and multi-type validation is feasible for a small class of RBEs. To further curb the high computational complexity of validation, we propose a natural notion of determinism and show that multi-type validation for the class of deterministic schemas using single-occurrence regular bag expressions (SORBEs) is tractable. Finally, we consider the problem of validating only a fragment of a graph with preassigned types for some of its nodes, and argue that for deterministic ShEx using SORBEs, multi-type validation can be performed efficiently and single-type validation can be performed with a single pass over the graph.

[1]  Boris Motik,et al.  Adding Integrity Constraints to OWL , 2007, OWLED.

[2]  Joachim Niehren,et al.  A learning algorithm for top-down XML transformations , 2010, PODS '10.

[3]  Anthony Widjaja Lin,et al.  Parikh Images of Grammars: Complexity and Applications , 2010, 2010 25th Annual IEEE Symposium on Logic in Computer Science.

[4]  Harold R. Solbrig,et al.  Towards an RDF Validation Language Based on Regular Expression Derivatives , 2015, EDBT/ICDT Workshops.

[5]  Bruno Courcelle,et al.  The monadic second-order logic of graphs XVI : Canonical graph decompositions , 2005, Log. Methods Comput. Sci..

[6]  Tony Tan,et al.  A Formalism for Graph Databases and its Model of Computation , 2011, AMW.

[7]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[8]  Iovka Boneva,et al.  Complexity and Expressiveness of ShEx for RDF , 2015, ICDT.

[9]  Dietmar Berwanger,et al.  Automata on Directed Graphs: Edge Versus Vertex Marking , 2006, ICGT.

[10]  Iovka Boneva,et al.  Simple Schemas for Unordered XML , 2013, WebDB.

[11]  Harold R. Solbrig,et al.  Shape expressions: an RDF validation and transformation language , 2014, SEM '14.

[12]  Wolfgang Thomas,et al.  Uniform and nonuniform recognizability , 2003, Theor. Comput. Sci..

[13]  J. Berstel,et al.  Context-free languages , 1993, SIGA.

[14]  Jiao Tao,et al.  Integrity Constraints in OWL , 2010, AAAI.

[15]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[16]  Iovka Boneva,et al.  Schemas for Unordered XML on a DIME , 2014, Theory of Computing Systems.

[17]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[18]  Thomas Schwentick,et al.  Inference of concise regular expressions and DTDs , 2010, TODS.

[19]  Jiao Tao,et al.  Extending OWL with Integrity Constraints , 2010, Description Logics.

[20]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[21]  J. Oncina,et al.  INFERRING REGULAR LANGUAGES IN POLYNOMIAL UPDATED TIME , 1992 .

[22]  Derek C. Oppen,et al.  A 2^2^2^pn Upper Bound on the Complexity of Presburger Arithmetic , 1978, J. Comput. Syst. Sci..

[23]  Robert E. Tarjan,et al.  Network Flow Algorithms , 1989 .

[24]  Arthur G. Ryman,et al.  OSLC Resource Shape: A language for defining constraints on Linked Data , 2013, LDOW.

[25]  M. Rabin Decidability of second-order theories and automata on infinite trees. , 1969 .

[26]  Jean-Marc Talbot,et al.  Automata and Logics for Unranked and Unordered Trees , 2005, RTA.

[27]  Bruno Courcelle,et al.  The Monadic Second-Order Logic of Graphs. I. Recognizable Sets of Finite Graphs , 1990, Inf. Comput..

[28]  Dario Colazzo,et al.  Efficient inclusion for a class of XML types with interleaving and counting , 2009, Inf. Syst..

[29]  Slawomir Staworko,et al.  Learning twig and path queries , 2012, ICDT '12.

[30]  Joost Engelfriet,et al.  Context-Free Graph Grammars , 1997, Handbook of Formal Languages.

[31]  James W. Thatcher,et al.  Generalized finite automata theory with an application to a decision problem of second-order logic , 1968, Mathematical systems theory.

[32]  S. Ginsburg,et al.  Semigroups, Presburger formulas, and languages. , 1966 .

[33]  Oscar H. Ibarra,et al.  On the Parikh Membership Problem for FAs, PDAs, and CMs , 2014, LATA.

[34]  Thomas Schwentick,et al.  Counting in trees , 2008, Logic and Automata.

[35]  Christos H. Papadimitriou,et al.  On the complexity of integer programming , 1981, JACM.

[36]  Georg Lausen,et al.  RDF Constraint Checking , 2015, EDBT/ICDT Workshops.

[37]  Dexter Kozen,et al.  Lower bounds for natural proof systems , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).