Structural join order selection for XML query optimization

Structural join operations are central to evaluating queries against XML data, and are typically responsible for consuming a lion's share of the query processing time. Thus, structural join order selection is at the heart of query optimization in an XML database, just as (value-based) join order selection is central to relational query optimization. We introduce five algorithms for structural join order optimization for XML tree pattern matching and present an extensive experimental evaluation. Our experiments demonstrate that many relational rules of thumb are no longer appropriate: for instance, using dynamic programming style optimization is not efficient; limiting consideration to left-deep plans usually misses the best solution. Our experiments also show that a dynamic programming optimization with pruning (DPP) algorithm can find the optimal solution, with low cost relative to the traditional dynamic programming (DP) algorithm; and an optimization technique that only considers fully pipelined (FP) plans can very quickly choose a plan that in most cases is close to optimal. Our recommendation is that DPP should be used in XML query optimizers where query execution time is expected to be significant, and that FP should be used where it is important to find a good (but not necessarily the best) plan quickly.

[1]  Carlo Zaniolo,et al.  Optimization of Nonrecursive Queries , 1986, VLDB.

[2]  Guy M. Lohman,et al.  Measuring the Complexity of Join Enumeration in Query Optimization , 1990, VLDB.

[3]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[4]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[5]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[6]  David Maier,et al.  Rapid bushy join-order optimization with Cartesian products , 1996, SIGMOD '96.

[7]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[8]  Jennifer Widom,et al.  Optimizing Branching Path Expressions , 1999 .

[9]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[10]  Guido Moerkotte,et al.  Heuristic and randomized optimization for the join ordering problem , 1997, The VLDB Journal.

[11]  Jignesh M. Patel,et al.  The Michigan Benchmark: A Microbenchmark for XML Query Processing Systems , 2002, EEXTT.

[12]  Jignesh M. Patel,et al.  Estimating Answer Sizes for XML Queries , 2002, EDBT.

[13]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[14]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[15]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[16]  H. V. Jagadish,et al.  Multi-level operator combination in XML query processing , 2002, CIKM '02.

[17]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[18]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[19]  Hartmut Liefke,et al.  Horizontal Query Optimization on Ordered Semistructured Data , 1999, WebDB.