Holistic Join for Generalized Tree Patterns

We consider the problem of evaluating an XQuery query Q (involving only child and descendant axes) on an XML document D. D is stored on a disk and is read from there, in document order. Chen et al. [From Tree Patterns to Generalized Tree Patterns: on efficient evaluation of XQuery, Proceedings of International Conference on Very Large Data Bases (VLDB), 2003, pp. 237-248] presented an algorithm to convert Q (from a large fragment of XQuery) into a Generalized Tree PatternGTP(Q), and a set J(Q) of value join conditions on its vertices. Evaluating Q on D reduces to finding the matches for GTP(Q) in D. We present an efficient algorithm for finding these matches. Excluding the computation of the value joins J(Q), our algorithm performs two linear passes over the data, and runs in O(d|Q|) memory space, where d denotes the depth of D; runtime and disk I/O are O(|Q@?D|). If separate input streams of document nodes for the individual vertices in GTP(Q) are available, our runtime and disk I/O are linear in the input size; this runtime and disk I/O are trivially optimal.

[1]  Christoph Koch,et al.  Efficient Processing of Expressive Node-Selecting Queries on XML Data in Secondary Storage: A Tree Automata-based Approach , 2003, VLDB.

[2]  Sihem Amer-Yahia Storage Techniques and Mapping Schemas for XML , 2003 .

[3]  Jignesh M. Patel,et al.  Structural join order selection for XML query optimization , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[5]  Hongjun Lu,et al.  Efficient Processing of Twig Queries with OR-Predicates. , 2004, ACM SIGMOD Conference.

[6]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[7]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[8]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[9]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[10]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[11]  Amélie Marian,et al.  Projecting XML Documents , 2003, VLDB.

[12]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[13]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[14]  Bertram Ludäscher,et al.  A Transducer-Based XML Query Processor , 2002, VLDB.

[15]  Prakash Ramanan Covering Indexes for XML Queries: Bisimulation - Simulation = Negation , 2003, VLDB.

[16]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[17]  Tok Wang Ling,et al.  PathStack : A Holistic Path Join Algorithm for Path Query with Not-Predicates on XML Data , 2005, DASFAA.

[18]  Laks V. S. Lakshmanan,et al.  On the evaluation of tree pattern queries , 2006, ICSOFT.

[19]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[20]  Derick Wood,et al.  On the Optimality of Holistic Algorithms for Twig Queries , 2003, DEXA.

[21]  Hongjun Lu,et al.  Efficient Processing of XML Twig Queries with All Predicates , 2004, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[22]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[23]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.