Processing Recursive XQuery over XML Streams: The Raindrop Approach

XML stream applications bring the challenge of efficiently processing queries on sequentially accessible tokenbased data. For efficient processing of queries, we need to ensure that memory usage stays low. This in turn requires that we avoid holding data in the query buffer, by outputting it at the earliest possible time. In this paper, we propose a new class of stream algebra operators for efficient recursive XQuery stream processing. In particular we propose two strategies for implementing structural joins: (a) the just-in-time structural join strategy efficiently processes joins as long as the input XML substreams are non-recursive and (b) the recursive structural join strategy supports structural joins over recursive XML substreams, however at an added cost of tuple-level ID-comparisons. Both structural join strategies are complemented by an automatadriven invocation mechanism that triggers the execution of the join at the first possible moment upon recognizing the end of the targeted input stream subelement. Further, we design this structural join operator itself to be context-aware. The operator is capable of at run-time switching from the efficient just-intime join strategy for elements that are recognized to be nonrecursive to the more powerful id-based structural join strategy for elements that are identified to be recursive. In addition, depending on whether the query is recursive, we will generate the plan with cheaper operators whenever possible. We incorporate the proposed techniques into the Raindrop stream engine. We also report on experimental studies we conducted using ToXgene that show that our techniques brings significant performance improvement.

[1]  Sriram Padmanabhan,et al.  EXPedite: a system for encoded XML processing , 2004, CIKM '04.

[2]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[4]  Stefanie Scherzinger,et al.  FluXQuery: An Optimizing XQuery Processor for Streaming XML Data , 2004, VLDB.

[5]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.

[6]  Alon Y. Halevy,et al.  An XML query engine for network-bound data , 2002, The VLDB Journal.

[7]  Bertram Ludäscher,et al.  A Transducer-Based XML Query Processor , 2002, VLDB.

[8]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[9]  Luis Gravano,et al.  Navigation- vs. index-based XML multi-query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Elke A. Rundensteiner,et al.  Automaton meets algebra: A hybrid paradigm for XML stream processing , 2006, Data Knowl. Eng..

[11]  Denilson Barbosa,et al.  ToXgene: a template-based data generator for XML , 2002, SIGMOD '02.

[12]  Byron Choi,et al.  What are real DTDs like? , 2002, WebDB.

[13]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[14]  Amélie Marian,et al.  Projecting XML Documents , 2003, VLDB.

[15]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Yanlei Diao,et al.  Query Processing for High-Volume XML Message Brokering , 2003, VLDB.