Efficiently loading and processing XML streams

XML stream applications bring the novel challenge of efficiently processing queries on sequentially accessible token-based input streams. Our Raindrop project is the first to accommodate token-based stream processing using an algebraic framework where both tokens and tuples are modeled in a uniform manner. In this paper, we illustrate how the stream loading model of our system on the fly conducts XML navigation over the input stream via concurrently constructing a minimized light-weight XML tree representation, which is called navigation-free data instance. These captured XML fragments are minimized in terms of buffer consumption. Based on the compact representation of the navigation-free data instances, we propose techniques for subsequent algebraic query evaluation, in particular, effective strategies for supporting multi-mode query operators and alternative data output semantics. The proposed stream loading model requires a much smaller buffer footprint, compared to alternative solutions in the literature such as Y-Filter. And the proposed algebra-based evaluation techniques offer effective ways to handle data recursion over XML streams, i.e., avoiding overhead from the structural join operators. Our stream loading and query evaluation techniques have been implemented as part of the Raindrop system. Experimental results based on the Raindrop system are also reported in this paper.

[1]  Bertram Ludäscher,et al.  A Transducer-Based XML Query Processor , 2002, VLDB.

[2]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[3]  Elke A. Rundensteiner,et al.  Automaton meets algebra: A hybrid paradigm for XML stream processing , 2006, Data Knowl. Eng..

[4]  Elke A. Rundensteiner,et al.  Semantic query optimization for processing XML streams with minimized memory footprint , 2008, DataX '08.

[5]  Stefanie Scherzinger,et al.  Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Alon Y. Halevy,et al.  An XML query engine for network-bound data , 2002, The VLDB Journal.

[7]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[8]  Leonidas Fegaras Efficient Processing of XML Update Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  Elke A. Rundensteiner,et al.  Automaton Meets Query Algebra: Towards a Unified Model for XQuery Evaluation over XML Data Streams , 2003, ER.

[10]  Elke A. Rundensteiner,et al.  Raindrop: a uniform and layered algebraic framework for XQueries on XML streams , 2003, CIKM '03.

[11]  Yanlei Diao,et al.  Query Processing for High-Volume XML Message Brokering , 2003, VLDB.

[12]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[13]  Amélie Marian,et al.  Projecting XML Documents , 2003, VLDB.

[14]  Elke A. Rundensteiner,et al.  Rainbow: multi-XQuery optimization using materialized XML views , 2003, SIGMOD '03.

[15]  Ming Li,et al.  Index Selection for Efficient XML Path Expression Processing , 2003, ER.

[16]  Luis Gravano,et al.  Navigation- vs. index-based XML multi-query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).