Towards Cost-based Query Optimization in Native XML Database Management Systems

In the last few years, XML became a de-facto standard for the exchange of structured and semi-structured data. The database research community took this development into account by proposing native XML database management systems for efficient and transactional management of XML documents. One of the most important factors for success of such systems is a powerful query optimizer. Many researchers proposed sophisticated Structural Join and Holistic Twig Join algorithms as well as several index structures supporting the evaluation of twig query patterns. Even though almost all XML query evaluation approaches proposed so far use some of these methods, we believe that they provide no sufficient input for real-world cost-based query optimization scenarios, because they only cover a small part of the overall query evaluation process. To provide adequate input for a cost-based XML query optimizer, we propose the XML Query Graph Model as a new internal representation enabling a smooth transition between XQuery language level and physical algebra operators. Furthermore, we introduce a set of rewrite rules for improving the execution of twig queries, e. g., by fusing two adjacent binary join operators to a complex n-way join operator. By presenting further rewrite rules, we make the most of existing joins and indexes—even before query transformation. Using these concepts, we are ready to sketch its integration into our upcoming costbased XML query optimizer.

[1]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[2]  Jérôme Siméon,et al.  Put a Tree Pattern in Your Algebra , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Michael V. Mannino,et al.  Statistical profile estimation in database systems , 1988, CSUR.

[4]  Laks V. S. Lakshmanan,et al.  Tree logical classes for efficient evaluation of XQuery , 2004, SIGMOD '04.

[5]  Jeffrey F. Naughton,et al.  On the integration of structure indexes and inverted lists , 2004, Proceedings. 20th International Conference on Data Engineering.

[6]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Alberto O. Mendelzon,et al.  Indexing XML Data with ToXin , 2001, WebDB.

[8]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[9]  David J. DeWitt,et al.  The EXODUS optimizer generator , 1987, SIGMOD '87.

[10]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[11]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[12]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Bernhard Mitschang Anfrageverarbeitung in Datenbanksystemen , 1995 .

[14]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[15]  Laks V. S. Lakshmanan,et al.  On the evaluation of tree pattern queries , 2006, ICSOFT.

[16]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[17]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[18]  Hamid Pirahesh,et al.  System RX: one part relational, one part XML , 2005, SIGMOD '05.

[19]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[20]  Christian Mathis Extending a tuple-based XPath algebra to enhance evaluation flexibility , 2007, Informatik - Forschung und Entwicklung.

[21]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[22]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[23]  Hua-Gang Li,et al.  FLUX: Content and Structure Matching of XPath Queries with Range Predicates , 2006, XSym.

[24]  Christian Mathis,et al.  Hash-Based Structural Join Algorithms , 2006, EDBT Workshops.

[25]  Guido Moerkotte,et al.  Let a Single FLWOR Bloom , 2007, XSym.

[26]  Sven Helmer,et al.  Full-fledged algebraic XPath processing in Natix , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  Christopher Ré,et al.  A Complete and Efficient Algebraic Compiler for XQuery , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[28]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[29]  Divesh Srivastava,et al.  Efficient Handling of Positional Predicates Within XML Query Processing , 2005, XSym.

[30]  Michael J. Carey,et al.  XPERANTO: Publishing Object-Relational Data as XML , 2000, WebDB.

[31]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[32]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.