On the memory requirements of XPath evaluation over XML streams

The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past two years, A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms.In this paper we initiate the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. We present a general lower bound technique, which given a query, specifies the minimum amount of memory that any algorithm evaluating the query on a stream would need to incur. The lower bounds are stated in terms of new graph-theoretic properties of queries. The proof is based on tools from communication complexity.We then exploit insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum. Our algorithm deviates from the standard paradigm of using automata or transducers, thereby avoiding the need to store large transition tables.

[1]  Marcus Fontoura,et al.  Buffering in query evaluation over XML streams , 2005, PODS '05.

[2]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[3]  Zachary G. Ives,et al.  EÆcient Evaluation of Regular Path Expressions on Streaming XML Data , 2000 .

[4]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[6]  Luc Segoufin,et al.  Typing and querying XML documents: some complexity bounds , 2003, PODS.

[7]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[8]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[9]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[10]  E. Kushilevitz,et al.  Communication Complexity: Basics , 1996 .

[11]  Nicole Schweikardt,et al.  Tight lower bounds for query processing on streaming and external memory data , 2005, Theor. Comput. Sci..

[12]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2004, ACM Trans. Database Syst..

[13]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[14]  Marcus Fontoura,et al.  Querying XML streams , 2005, The VLDB Journal.

[15]  François Bry,et al.  An evaluation of regular path expressions with qualifiers against XML streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[16]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[17]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[18]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[19]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[20]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, Proceedings 18th International Conference on Data Engineering.

[21]  Dan Suciu,et al.  XMLTK: An XML Toolkit for Scalable XML Stream Processing , 2002 .

[22]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[23]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[24]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[25]  Derick Wood,et al.  On the Optimality of Holistic Algorithms for Twig Queries , 2003, DEXA.