Evaluating XPath Queries on XML Data Streams

Whenever queries have to be evaluated on XML data streams - or when the memory that is available to evaluate the XML data is relatively small compared to the document - DOM based approaches that have to load and store large parts of the document in main memory will fail. In comparison, we present an approach to evaluate XPath queries on SAX streams that supports all axes of core XPath, including the sibling axes. Starting from the XPath query, our approach generates a stack of automata that uses the SAX stream as input and generates the result of the query as an output SAX stream. An evaluation of our implementation shows that in general our approach needs less main memory, but at the same time is faster than both, Saxon and YFilter.

[1]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[2]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[3]  Marcus Fontoura,et al.  Streaming XPath processing with forward and backward axes , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[5]  Yanlei Diao,et al.  Towards an Internet-Scale XML Dissemination Service , 2004, VLDB.

[6]  Marcus Fontoura,et al.  Querying XML streams , 2005, The VLDB Journal.

[7]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[8]  Dan Suciu,et al.  Database and XML Technologies , 2004, Lecture Notes in Computer Science.

[9]  Massimo Franceschet XPathMark: An XPath Benchmark for the XMark Generated Data , 2005, XSym.

[10]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[11]  Dan Suciu,et al.  Processing XML streams with deterministic automata and stream indexes , 2004, TODS.

[12]  Marcus Fontoura,et al.  Buffering in query evaluation over XML streams , 2005, PODS '05.

[13]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[14]  Tim Furche,et al.  The XML stream query processor SPEX , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[16]  Marcus Fontoura,et al.  On the memory requirements of XPath evaluation over XML streams , 2004, PODS.

[17]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Alon Y. Halevy,et al.  An XML query engine for network-bound data , 2002, The VLDB Journal.

[19]  Dan Suciu,et al.  XMLTK: An XML Toolkit for Scalable XML Stream Processing , 2002 .

[20]  Jun'ichi Tatemura,et al.  AFilter: adaptable XML filtering with prefix-caching suffix-clustering , 2006, VLDB.

[21]  François Bry,et al.  An evaluation of regular path expressions with qualifiers against XML streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).