Buffering in query evaluation over XML streams

All known algorithms for evaluating advanced XPath queries (e.g., ones with predicates or with closure axes) on XML streams employ buffers to temporarily store fragments of the document stream. In many cases, these buffers grow very large and constitute a major memory bottleneck. In this paper, we identify two broad classes of evaluation problems that independently necessitate the use of large memory buffers in evaluation of queries over XML streams: (1) full-fledged evaluation (as opposed to just filtering) of queries with predicates; (2) evaluation (whether full-fledged or filtering) of queries with "multi-variate" predicates.We prove quantitative lower bounds on the amount of memory required in each of these scenarios. The bounds are stated in terms of novel document properties that we define. We show that these scenarios, in combination with query evaluation over recursive documents, cover the cases in which large buffers are required. Finally, we present algorithms that match the lower bounds for an important fragment of XPath.

[1]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[2]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[3]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[4]  Zachary G. Ives,et al.  EÆcient Evaluation of Regular Path Expressions on Streaming XML Data , 2000 .

[5]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[6]  M. Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[7]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[8]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[9]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[10]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Dan Suciu,et al.  XMLTK: An XML Toolkit for Scalable XML Stream Processing , 2002 .

[12]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[13]  Michael E. Saks,et al.  Space lower bounds for distance approximation in the data stream model , 2002, STOC '02.

[14]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[15]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[16]  Amélie Marian,et al.  Projecting XML Documents , 2003, VLDB.

[17]  Luc Segoufin,et al.  Typing and querying XML documents: some complexity bounds , 2003, PODS.

[18]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[19]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[20]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[21]  David P. Woodruff,et al.  Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[22]  François Bry,et al.  An evaluation of regular path expressions with qualifiers against XML streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[24]  Marcus Fontoura,et al.  On the memory requirements of XPath evaluation over XML streams , 2004, PODS.

[25]  Marcus Fontoura,et al.  Querying XML streams , 2005, The VLDB Journal.

[26]  Stefanie Scherzinger,et al.  Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams , 2004, VLDB.