Scalable Filtering of Multiple Generalized-Tree-Pattern Queries over XML Streams

An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex generalized-tree-pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via a shared bottom-up path matching. Second, with the aid of this TOP encoding, we can (1) achieve polynomial time and space complexity for post processing, (2) avoid redundant predicate evaluations, (3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches and (4) simplify the processing of GTP queries. Overall our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient post processing for GTP queries. Extensive performance studies show that our GFilter solution not only achieves significantly better filtering performance than state-of-the-art algorithms, but also is capable of efficiently filtering the more complex GTP queries.

[1]  Hamid Pirahesh,et al.  System RX: one part relational, one part XML , 2005, SIGMOD '05.

[2]  Jussi Myllymaki,et al.  Implementing a scalable XML publish/subscribe system using relational database systems , 2004, SIGMOD '04.

[3]  Dan Suciu,et al.  Processing XML streams with deterministic automata and stream indexes , 2004, TODS.

[4]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[5]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[6]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Alon Y. Halevy,et al.  An XML query engine for network-bound data , 2002, The VLDB Journal.

[8]  Hao Zhang,et al.  Path sharing and predicate evaluation for high-performance XML filtering , 2003, TODS.

[9]  Yanlei Diao,et al.  Query Processing for High-Volume XML Message Brokering , 2003, VLDB.

[10]  Joonho Kwon,et al.  FiST: Scalable XML Document Filtering by Sequencing Twig Patterns , 2005, VLDB.

[11]  SuciuDan,et al.  Processing XML streams with deterministic automata and stream indexes , 2004 .

[12]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[13]  Luis Gravano,et al.  Navigation- vs. index-based XML multi-query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  Latha S. Colby A recursive algebra and query optimization for nested relations , 1989, SIGMOD '89.

[15]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[16]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[17]  Laks V. S. Lakshmanan,et al.  On Efficient Matching of Streaming XML Documents and Queries , 2002, EDBT.

[18]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[19]  Elke A. Rundensteiner,et al.  Semantic Query Optimization for XQuery over XML Streams , 2005, VLDB.

[20]  Jun'ichi Tatemura,et al.  AFilter: adaptable XML filtering with prefix-caching suffix-clustering , 2006, VLDB.

[21]  Laks V. S. Lakshmanan,et al.  On the evaluation of tree pattern queries , 2006, ICSOFT.

[22]  Denilson Barbosa,et al.  ToXgene: a template-based data generator for XML , 2002, SIGMOD '02.

[23]  Michael J. Carey,et al.  The BEA/XQRL Streaming XQuery Processor , 2003, VLDB.