Prefilter: predicate pushdown at streaming speeds

This paper presents the prefilter: a predicate pushdown framework for a Data Stream Management System (DSMS). Though early predicate evaluation is a well-known query optimization strategy, novel problems arise in a high-performance DSMS. In particular, (i) query invocation costs are high as compared to the cost of evaluating simple predicates that are often used in high-speed stream analysis; (ii) selectivity estimates may become inaccurate over time; and (iii) multiple queries, possibly containing common subexpressions, must be processed continuously. The prefilter addresses these issues by constructing appropriate predicates for early evaluation as soon as new data arrive and before any queries are invoked. It also compresses the bit vector representing the outcomes of pushed-down predicates over newly arrived tuples, and uses the compressed bitmap to efficiently check which queries do not have to be invoked. Using a set of network monitoring queries, we show that the performance of the Gigascope DSMS is significantly improved by the prefilter.

[1]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[2]  M. Karnaugh The map method for synthesis of combinational logic circuits , 1953, Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics.

[3]  Divesh Srivastava,et al.  Holistic UDAFs at streaming speeds , 2004, SIGMOD '04.

[4]  Joseph M. Hellerstein,et al.  The Case for Precision Sharing , 2004, VLDB.

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[7]  Surajit Chaudhuri,et al.  Towards a robust query optimizer: a principled and practical approach , 2005, SIGMOD '05.

[8]  Beng Chin Ooi,et al.  Multiple aggregations over data streams , 2005, SIGMOD '05.

[9]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Philip S. Yu,et al.  Interval query indexing for efficient stream processing , 2004, CIKM '04.

[11]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[14]  Theodore Johnson,et al.  Gigascope: high performance network monitoring with an SQL interface , 2002, SIGMOD '02.

[15]  David J. DeWitt,et al.  Design and evaluation of alternative selection placement strategies in optimizing continuous queries , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[17]  Jae-Gil Lee,et al.  Continuous query processing in data streams using duality of data and queries , 2006, SIGMOD Conference.

[18]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[19]  Michael J. Franklin,et al.  PSoup: a system for streaming queries over streaming data , 2003, The VLDB Journal.

[20]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[21]  Matthew Denny,et al.  Predicate result range caching for continuous queries , 2005, ACM SIGMOD Conference.

[22]  Jennifer Widom,et al.  Optimization of continuous queries with shared expensive filters , 2007, PODS.

[23]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.