Efficient algorithms for evaluating xpath over streams

In this paper we address the problem of evaluating XPath queries over streaming XML data. We consider a practical XPath fragment called Univariate XPath, which includes the commonly used '/' and '//' axes and allows *-node tests and arbitrarily nested predicates. It is well known that this XPath fragment can be efficiently evaluated in O(|D||Q|) time in the non-streaming environment, where |D| is the document size and |Q| is the query size. However, this is not necessarily true in the streaming environment, since streaming algorithms have to satisfy stricter requirement than non-streaming algorithms, in that all data must be read sequentially in one pass. Therefore, it is not surprising that state-of-the-art stream-querying algorithms have higher time complexity than O(|D||Q|). In this paper we revisit the XPath stream-querying problem, and show that Univariate XPath can be efficiently evaluated in O|D||Q|) time in the streaming environment. Specifically, we propose two O(|D||Q|)-time stream-querying algorithms, LQ and EQ, which are based on the lazy strategy and on the eager strategy, respectively. To the best of our knowledge, LQ and EQ are the first XPath stream-querying algorithms that achieve O(|D||Q|) time performance. Further, our algorithms achieve O(|D||Q|) time performance without trading off space performance. Instead, they have better buffering-space performance than state-of-the-art stream-querying algorithms. In particular, EQ achieves optimal buffering-space performance. Our experimental results show that our algorithms have not only good theoretical complexity but also considerable practical performance advantages over existing algorithms.

[1]  Stefanie Scherzinger,et al.  Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams , 2004, VLDB.

[2]  Michael J. Carey,et al.  The BEA/XQRL Streaming XQuery Processor , 2003, VLDB.

[3]  Marcus Fontoura,et al.  Querying XML streams , 2005, The VLDB Journal.

[4]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[6]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[7]  Byron Choi,et al.  What are real DTDs like? , 2002, WebDB.

[8]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[9]  Dan Suciu,et al.  Processing XML streams with deterministic automata and stream indexes , 2004, TODS.

[10]  Hao Zhang,et al.  Path sharing and predicate evaluation for high-performance XML filtering , 2003, TODS.

[11]  Luc Segoufin,et al.  Typing and querying XML documents: some complexity bounds , 2003, PODS.

[12]  Marcus Fontoura,et al.  Buffering in query evaluation over XML streams , 2005, PODS '05.

[13]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[14]  Marcus Fontoura,et al.  On the memory requirements of XPath evaluation over XML streams , 2004, PODS.

[15]  Sudarshan S. Chawathe,et al.  XSQ: A streaming XPath engine , 2005, TODS.

[16]  Marcus Fontoura,et al.  Streaming XPath processing with forward and backward axes , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[17]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[18]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Bertram Ludäscher,et al.  A Transducer-Based XML Query Processor , 2002, VLDB.

[20]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.