Efficient evaluation of generalized path pattern queries on XML data

Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path-patterns or tree-patterns. Requirements in flexible querying of XML data have motivated recently the introduction of query languages that allow a partial specification of path-patterns in a query. In this paper, we focus on the efficient evaluation of partial path queries, a generalization of path pattern queries. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries under the indexed streaming evaluation model. The first one exploits a structural summary of data to generate a set of path-patterns that together are equivalent to a partial path query. To evaluate these path-patterns, we extend PathStack so that it can work on path-patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. An analysis of the algorithms and extensive experimental evaluation shows that the holistic algorithm outperforms the other ones.

[1]  Marcus Fontoura,et al.  Virtual cursors for XML joins , 2004, CIKM '04.

[2]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[3]  Hongjun Lu,et al.  Efficient Processing of Twig Queries with OR-Predicates. , 2004, ACM SIGMOD Conference.

[4]  Dan Olteanu,et al.  Forward node-selecting queries over trees , 2007, TODS.

[5]  Timos K. Sellis,et al.  Heuristic containment check of partial tree-pattern queries in the presence of index graphs , 2006, CIKM '06.

[6]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[7]  Xiaoying Wu,et al.  Evaluation of partial path queries on xml data , 2007, CIKM '07.

[8]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[9]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[10]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[11]  Narain H. Gehani,et al.  Semantic querying of tree-structured data sources using partially specified tree patterns , 2005, CIKM '05.

[12]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[13]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[14]  Derick Wood,et al.  On the Optimality of Holistic Algorithms for Twig Queries , 2003, DEXA.

[15]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[19]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[20]  Xiaoying Wu,et al.  Assigning semantics to partial tree-pattern queries , 2008, Data Knowl. Eng..

[21]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[22]  Rada Chirkova,et al.  Efficient algorithms for evaluating xpath over streams , 2007, SIGMOD '07.

[23]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[24]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[25]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[26]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[27]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[28]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[29]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[30]  Jignesh M. Patel,et al.  Structural join order selection for XML query optimization , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).