Cost-sensitive reordering of navigational primitives

We present a method to evaluate path queries based on the novel concept of partial path instances. Our method (1) maximizes performance by means of sequential scans or asynchronous I/O, (2) does not require a special storage format, (3) relies on simple navigational primitives on trees, and (4) can be complemented by existing logical and physical optimizations such as duplicate elimination, duplicate prevention and path rewriting.We use a physical algebra which separates those navigation operations that require I/O from those that do not. All I/O operations necessary for the evaluation of a path are isolated in a single operator, which may employ efficient I/O scheduling strategies such as sequential scans or asynchronous I/O.Performance results for queries from the XMark benchmark show that reordering the navigation operations can increase performance up to a factor of four.

[1]  Guido Moerkotte,et al.  Advanced Query Processing in Object Bases Using Access Support Relations , 1990, VLDB.

[2]  Jiawei Han,et al.  Join Index Hierarchies for Supporting Efficient Navigations in Object-Oriented Databases , 1994, VLDB.

[3]  J. Eliot B. Moss,et al.  Working with Persistent Objects: To Swizzle or Not to Swizzle , 1992, IEEE Trans. Software Eng..

[4]  Catriel Beeri,et al.  SAL: An Algebra for Semistructured Data and XML , 1999, WebDB.

[5]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[7]  Wenfei Fan,et al.  Vectorizing and querying large XML repositories , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[9]  Georg Gottlob,et al.  XPath query evaluation: improving time and space efficiency , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[11]  Jan Hidders,et al.  Avoiding Unnecessary Ordering Operations in XPath , 2003, DBPL.

[12]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[13]  Sven Helmer,et al.  Full-fledged algebraic XPath processing in Natix , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[15]  Sven Helmer,et al.  Anatomy of a native XML base management system , 2002, The VLDB Journal.

[16]  Alfons Kemper,et al.  Dual-Buffering Strategies in Object Bases , 1994, VLDB.

[17]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[18]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[19]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[20]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[21]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[22]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[23]  Christoph Koch,et al.  Efficient Processing of Expressive Node-Selecting Queries on XML Data in Secondary Storage: A Tree Automata-based Approach , 2003, VLDB.

[24]  David Maier,et al.  Efficient Assembly of Complex Objects ; CU-CS-502-90 , 1990 .

[25]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.