Cost based plan selection for xpath

We present a complete XPath cost-based optimization and execution framework and demonstrate its effectiveness and efficiency for a variety of queries and datasets. The framework is based on a logical XPath algebra with novel features and operators and a comprehensive set of rewriting rules that together enable us to algebraically capture many existing and novel processing strategies for XPath queries. An important part of the framework is PSA, a very efficient cost-based plan selection algorithm for XPath queries. In the presented experimental evaluation, PSA picked the cheapest estimated query plan in 100% of the cases. Our cost-based query optimizer independent of the underlying physical data model and storage system and of the available logical operator implementations, depending on a set of well-defined APIs. We also present an implementation of those APIs, including primitive access methods, a large pool of physical operators, statistics estimators and cost models, and experimentally demonstrate the effectiveness of our end-to-end query optimization system.

[1]  Vasilis Vassalos,et al.  Efficient physical operators for cost-based XPath execution , 2010, EDBT '10.

[2]  Christian Mathis Integrating Structural Joins into a Tuple-Based XPath Algebra , 2007, BTW.

[3]  Michael J. Carey,et al.  The BEA streaming XQuery processor , 2004, The VLDB Journal.

[4]  Hongjun Lu,et al.  Efficient Processing of XML Path Queries Using the Disk-based F&B Index , 2005, VLDB.

[5]  Christian Mathis,et al.  XTCcmp: XQuery compilation on XTC , 2008, Proc. VLDB Endow..

[6]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[7]  Sven Helmer,et al.  Algebraic Optimization of Nested XPath Expressions , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Sven Helmer,et al.  Full-fledged algebraic XPath processing in Natix , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Christopher Ré,et al.  A Complete and Efficient Algebraic Compiler for XQuery , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[11]  Cong Yu,et al.  TIMBER: a native system for querying XML , 2003, SIGMOD '03.

[12]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[13]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[14]  Vasilis Vassalos,et al.  Xpath on steroids: exploiting relational engines for xpath performance , 2007, SIGMOD '07.

[15]  Neoklis Polyzotis,et al.  Selectivity estimation for XML twigs , 2004, Proceedings. 20th International Conference on Data Engineering.

[16]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[17]  Elke A. Rundensteiner,et al.  VAMANA - A Scalable Cost-Driven XPath Engine , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[18]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[19]  Mong-Li Lee,et al.  An Estimation System for XPath Expressions , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  José de Aguiar Moraes Filho,et al.  Statistics for Cost-Based XML Query Optimization , 2006, Grundlagen von Datenbanken.

[21]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[22]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[23]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  Kam-Fai Wong,et al.  Fast Structural Join with a Location Function , 2006, DASFAA.

[25]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[26]  Shankar Pal,et al.  Indexing XML Data Stored in a Relational Database , 2004, VLDB.

[27]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[28]  Torsten Grust,et al.  Why off-the-shelf RDBMSs are better at XPath than you might expect , 2007, SIGMOD '07.

[29]  Chun Zhang,et al.  Cost-based optimization in DB2 XML , 2006, IBM Syst. J..