Answering regular path queries on workflow provenance

This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach.

[1]  Wim Martens,et al.  The complexity of evaluating path expressions in SPARQL , 2012, PODS '12.

[2]  Dan Suciu,et al.  Optimizing regular path expressions using graph schemas , 1998, Proceedings 14th International Conference on Data Engineering.

[3]  Catriel Beeri,et al.  Querying business processes , 2006, VLDB.

[4]  Thomas Heinis,et al.  Efficient lineage tracking for scientific workflows , 2008, SIGMOD Conference.

[5]  Ulf Leser,et al.  Regular Path Queries on Large Graphs , 2012, SSDBM.

[6]  Yang Xiang,et al.  Computing label-constraint reachability in graph databases , 2010, SIGMOD Conference.

[7]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[9]  Bertram Ludäscher,et al.  On implementing provenance-aware regular path queries with relational query engines , 2013, EDBT '13.

[10]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[11]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[12]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[13]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[14]  Alberto O. Mendelzon,et al.  Finding Regular Simple Paths in Graph Databases , 1989, SIAM J. Comput..

[15]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[16]  Ulf Leser,et al.  Search, adapt, and reuse: the future of scientific workflows , 2011, SGMD.

[17]  Hai Zhuge,et al.  Comments on "Stack-based Algorithms for Pattern Matching on DAGs" , 2012, Proc. VLDB Endow..

[18]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[19]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[20]  Tova Milo,et al.  Labeling Workflow Views with Fine-Grained Dependencies , 2012, Proc. VLDB Endow..

[21]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[22]  Jianzhong Li,et al.  Coding-based Join Algorithms for Structural Queries on Graph-Structured XML Document , 2008, World Wide Web.

[23]  Ulf Leser,et al.  (Re)Use in Public Scientific Workflow Repositories , 2012, SSDBM.

[24]  Leonid Libkin,et al.  Regular path queries on graphs with data , 2012, ICDT '12.

[25]  Carlos A. Hurtado,et al.  Edinburgh Research Explorer Expressive Languages for Path Queries over Graph-Structured Data , 2012 .

[26]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[27]  Sanjeev Khanna,et al.  An optimal labeling scheme for workflow provenance using skeleton labels , 2010, SIGMOD Conference.

[28]  Tova Milo,et al.  Labeling recursive workflow executions on-the-fly , 2011, SIGMOD '11.

[29]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.