Pathfinder: XQuery Off the Relational Shelf

The Pathfinder project makes inventive use of relational database technology—originally developed to process data of strictly tabular shape—to construct efficient database-supported XML and XQuery processors. Pathfinder targets database engines that implement a set-oriented mode of query execution: many off-the-shelf traditional database systems make for suitable XQuery runtime environments, but a number of off-beat storage back-ends fit that bill as well. While Pathfinder has been developed with a close eye on the XQuery semantics, some of the techniques that we will review here will be generally useful to evaluate XQuery-style iterative languages on database back-ends. 1 The Rectangularization of XQuery: Purely Relational XML Processing If you zoom back in time to dig for the semantic roots of XQuery [5], you will find that the language’s core construct, the for–let–where–order by–return (FLWOR) block is one particular incarnation of a very general idea: the comprehension [26]. Many language-related concepts may be uniformly understood in comprehension form, but comprehensions provide a particularly concise and elegant way to express iteration over collections of objects—in the case of XQuery: finite, ordered sequences of XML nodes and atomic values (or items) [1]. Any program or query expressed in comprehension form is subject to a number of useful equivalencepreserving rewriting rules (the monad laws) and so is XQuery’s FLWOR block. Once you look closely, a wide range of seemingly XQuery-specific optimizations realized by compilers and interpreters today, e.g., for loop fusion or unnesting, in fact put the monad laws to work. The family of programming and query languages whose semantic core may be cast in comprehension form is large. Among its members, specifically, is SQL, the relational database language. This observation sparked a whole line of work that we will review in the following pages: Exploit the common semantic ground of XQuery and SQL and try to turn relational database systems (i.e., processors for strictly tabular, or rectangular, data) into efficient and scalable XQuery processors. XQuery processors of this type should be able to benefit from the 30+ years of research and engineering experience that shaped relational database technology. This is, in fact, what we repeatedly observed in the course of Copyright 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering

[1]  Torsten Grust,et al.  Purely Relational FLWORs , 2005, XIME-P.

[2]  Brian Beckman,et al.  LINQ: reconciling object, relations and XML in the .NET framework , 2006, SIGMOD Conference.

[3]  Sherif Sakr,et al.  A SQL: 1999 code generator for the pathfinder xquery compiler , 2007, SIGMOD '07.

[4]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[5]  Torsten Grust,et al.  Why off-the-shelf RDBMSs are better at XPath than you might expect , 2007, SIGMOD '07.

[6]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[7]  Torsten Grust,et al.  An Injection of Tree Awareness: Adding Staircase Join to PostgreSQL , 2004, VLDB.

[8]  Torsten Grust,et al.  XQuery Join Graph Isolation , 2008, ArXiv.

[9]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[10]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[11]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[12]  Philip Wadler,et al.  Comprehending monads , 1990, Mathematical Structures in Computer Science.

[13]  Torsten Grust,et al.  eXrQuy: Order Indifference in XQuery , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[15]  Sherif Sakr,et al.  XQuery on SQL Hosts , 2004, VLDB.

[16]  Sherif Sakr,et al.  Dependable cardinality forecasts for XQuery , 2008, Proc. VLDB Endow..

[17]  Martin L. Kersten,et al.  Optimizing database architecture for the new bottleneck: memory access , 2000, The VLDB Journal.

[18]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[19]  Jens Teubner,et al.  Scalable XQuery type matching , 2008, EDBT '08.

[20]  M. Tamer Özsu,et al.  A comprehensive XQuery to SQL translation using dynamic interval encoding , 2003, SIGMOD '03.

[21]  Torsten Grust,et al.  Accelerating XPath evaluation in any RDBMS , 2004, TODS.

[22]  Torsten Grust,et al.  Jump Through Hoops to Grok the Loops Pathfinder ’ s Purely Relational Account of XQuery-style Iteration Semantics , 2008 .