The complexity of regular expressions and property paths in SPARQL

The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph-structured data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a nonstandard manner. We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of: (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately. As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.

[1]  Thomas Schwentick,et al.  Inference of concise regular expressions and DTDs , 2010, TODS.

[2]  Wim Martens,et al.  The complexity of evaluating path expressions in SPARQL , 2012, PODS '12.

[3]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[4]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[5]  Diego Calvanese,et al.  Containment of Conjunctive Regular Path Queries with Inverse , 2000, KR.

[6]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[7]  Marcelo Arenas,et al.  Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard , 2012, WWW.

[8]  Mihalis Yannakakis,et al.  Graph-theoretic methods in database theory , 1990, PODS.

[9]  Marc Gyssens,et al.  Regular Expressions with Counting: Weak versus Strong Determinism , 2009, SIAM J. Comput..

[10]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.

[11]  Serge Abiteboul,et al.  Regular Path Queries with Constraints , 1999, J. Comput. Syst. Sci..

[12]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[13]  V. Glushkov THE ABSTRACT THEORY OF AUTOMATA , 1961 .

[14]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[15]  Serge Abiteboul,et al.  Regular path queries with constraints , 1997, J. Comput. Syst. Sci..

[16]  Dan Suciu,et al.  Query containment for conjunctive queries with regular expressions , 1998, PODS.

[17]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[18]  Carme Àlvarez,et al.  A Very Hard log-Space Counting Class , 1993, Theor. Comput. Sci..

[19]  Pekka Kilpeläinen,et al.  Regular Expressions with Numerical Occurrence Indicators - preliminary results , 2003, SPLST.

[20]  Yanhong A. Liu,et al.  Solving Regular Path Queries , 2002, MPC.

[21]  Diego Calvanese,et al.  View-based query processing for regular path queries with inverse , 2000, PODS '00.

[22]  Frank Neven,et al.  Optimizing Schema Languages for XML: Numerical Constraints and Interleaving , 2009, SIAM J. Comput..

[23]  Christos H. Papadimitriou,et al.  The even-path problem for graphs and digraphs , 1984, Networks.

[24]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[25]  C. M. Sperberg-McQueen,et al.  W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures , 2012 .

[26]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[27]  Wim Martens,et al.  Querying graph databases with XPath , 2013, ICDT '13.

[28]  Shimon Even,et al.  Ambiguity in Graphs and Expressions , 1971, IEEE Transactions on Computers.

[29]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[30]  Neil Immerman,et al.  Reachability Logic: An Efficient Fragment of Transitive Closure Logic , 2000, Log. J. IGPL.

[31]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[32]  Claude Berge,et al.  Graphs and Hypergraphs , 2021, Clustering.

[33]  Sampath Kannan,et al.  Counting and random generation of strings in regular languages , 1995, SODA '95.

[34]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[35]  Leonid Libkin,et al.  Regular path queries on graphs with data , 2012, ICDT '12.

[36]  Dan Suciu,et al.  Declarative specification of Web sites with Strudel , 2000, The VLDB Journal.

[37]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[38]  Thomas Schwentick,et al.  Complexity of Decision Problems for Simple Regular Expressions , 2004, MFCS.

[39]  Larry Joseph Stockmeyer,et al.  The complexity of decision problems in automata theory and logic , 1974 .

[40]  Alin Deutsch,et al.  Optimization Properties for Classes of Conjunctive Regular Path Queries , 2001, DBPL.

[41]  Rance Cleaveland,et al.  A linear-time model-checking algorithm for the alternation-free modal mu-calculus , 1993, Formal Methods Syst. Des..

[42]  Marcelo Arenas,et al.  nSPARQL: A Navigational Language for RDF , 2008, SEMWEB.

[43]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[44]  Dario Colazzo,et al.  Efficient inclusion for a class of XML types with interleaving and counting , 2009, Inf. Syst..

[45]  MartensWim,et al.  The complexity of regular expressions and property paths in SPARQL , 2013 .

[46]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[47]  Paul J. Walmsley,et al.  XML Schema Part 0: Primer Second Edition , 2004 .

[48]  Dario Colazzo,et al.  Efficient asymmetric inclusion between regular expression types , 2009, ICDT '09.

[49]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[50]  Jean-François Baget,et al.  Extending SPARQL with regular expression patterns (for querying RDF) , 2009, J. Web Semant..

[51]  Pekka Kilpeläinen,et al.  One-unambiguity of regular expressions with numeric occurrence indicators , 2007, Inf. Comput..

[52]  Alberto O. Mendelzon,et al.  Finding Regular Simple Paths in Graph Databases , 1989, SIAM J. Comput..

[53]  Thomas Schwentick,et al.  Complexity of Decision Problems for XML Schemas and Chain Regular Expressions , 2009, SIAM J. Comput..