Pathfinder: relational XQuery over multi-gigabyte XML inputs in interactive time

textabstractUsing a relational DBMS as back-end engine for an XQuery processing system leverages relational query optimization and scalable query processing strategies provided by mature DBMS engines in the XML domain. Though a lot of theoretical work has been done in this area and various solutions have been proposed, no complete systems have been made available so far to give the practical evidence that this is a viable approach. In this paper, we describe the ourely relational XQuery processor Pathfinder that has been built on top of the extensible RDBMS MonetDB. Performance results indicate that the system is capable of evaluating XQuery queries efficiently, even if the input XML documents become huge. We additionally present further contributions such as loop-lifted staircase join, techniques to derive order properties and to reduce sorting effort in the generated relational algebra plans, as well as methods for optimizing XQuery joins, which, taken together, enabled us to reach our performance and scalability goals

[1]  Alin Deutsch,et al.  MARS: A System for Publishing XML from Mixed and Redundant Storage , 2003, VLDB.

[2]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[3]  Michael J. Carey,et al.  The BEA streaming XQuery processor , 2004, The VLDB Journal.

[4]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[5]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[6]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[7]  Amélie Marian,et al.  Implementing Xquery 1.0: The Galax Experience , 2003, VLDB.

[8]  M. Tamer Özsu,et al.  A comprehensive XQuery to SQL translation using dynamic interval encoding , 2003, SIGMOD '03.

[9]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[10]  Jeffrey F. Naughton,et al.  XML-SQL Query Translation Literature: The State of the Art and Open Problems , 2003, Xsym.

[11]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[12]  Stefanie Scherzinger,et al.  Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams , 2004, VLDB.

[13]  Alin Deutsch,et al.  The next+ framework for logical xquery optimization , 2004, VLDB 2004.

[14]  Sherif Sakr,et al.  XQuery on SQL Hosts , 2004, VLDB.

[15]  Xiaoyu Wang,et al.  Avoiding sorting and grouping in processing queries , 2003, VLDB 2003.

[16]  Edith Cohen,et al.  Labeling dynamic XML trees , 2002, SIAM J. Comput..

[17]  Laks V. S. Lakshmanan,et al.  Tree logical classes for efficient evaluation of XQuery , 2004, SIGMOD '04.

[18]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[19]  Guido Moerkotte,et al.  A Combined Framework for Grouping and Order Optimization , 2004, VLDB.

[20]  Torsten Grust,et al.  Relational Algebra: Mother Tongue - XQuery: Fluent , 2004, TDM.

[21]  Eugene J. Shekita,et al.  Fundamental techniques for order optimization , 1996, SIGMOD '96.

[22]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[23]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[24]  Martin L. Kersten,et al.  MIL primitives for querying a fragmented world , 1999, The VLDB Journal.