On the memory requirements of XPath evaluation over XML streams

The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past few years. A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms. In this paper we initiate the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. We present a general lower bound technique, which given a query, specifies the minimum amount of memory that any algorithm evaluating the query on a stream would need to incur. The lower bounds are stated in terms of new graph-theoretic properties of queries. The proofs are based on tools from communication complexity. We then exploit insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum. Our algorithm deviates from the standard paradigm of using automata or transducers, thereby avoiding the need to store large transition tables.

[1]  Dan Suciu,et al.  XMLTK: An XML Toolkit for Scalable XML Stream Processing , 2002 .

[2]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[3]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[4]  Nicole Schweikardt,et al.  Tight lower bounds for query processing on streaming and external memory data , 2005, Theor. Comput. Sci..

[5]  Derick Wood,et al.  On the Optimality of Holistic Algorithms for Twig Queries , 2003, DEXA.

[6]  Marcus Fontoura,et al.  Querying XML streams , 2005, The VLDB Journal.

[7]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[8]  Eyal Kushilevitz,et al.  Communication Complexity , 1997, Adv. Comput..

[9]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[10]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[11]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[12]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[13]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[14]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[15]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[16]  Marcus Fontoura,et al.  On the memory requirements of XPath evaluation over XML streams , 2004, PODS.

[17]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[18]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[19]  Marcus Fontoura,et al.  Buffering in query evaluation over XML streams , 2005, PODS '05.

[20]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[21]  François Bry,et al.  An evaluation of regular path expressions with qualifiers against XML streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[22]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[23]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[24]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[25]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[26]  Luc Segoufin,et al.  Typing and querying XML documents: some complexity bounds , 2003, PODS.