Efficient Construction of Compact Shedding Filters for Data Stream Processing

High-volume source streams, coupled with fluctuating rates, necessitate adaptive load shedding in data stream processing. When ignored, a continual query (CQ) server may randomly drop items, when its capacity is inadequate to handle the arriving data, and degrade the quality of the query results. To alleviate this problem, filters can be used at the source nodes. However, regular source filtering in itself is not sufficient to prevent random dropping, because the amount of data passing through the filters can still surpass the server's capacity. In this case, intelligent load shedding can be applied by the source filters to minimize the degradation in result quality. In this paper, we introduce a novel type of load-shedding source filters, called non- uniformly regulated (NR) sifters. An NR sifter judiciously applies varying amounts of load shedding to different regions of the data space within the sifter. We formulate the problem of constructing NR sifters as an optimization one. NR sifters are compact and quickly configurable, allowing frequent adaptations, and provide fast lookup f.or deciding if a data item should be dropped. We structure NR sifters as a set of (sifter region, drop threshold) pairs to achieve compactness, develop query consolidation techniques to enable quick construction, and introduce flexible space partitioning mechanisms to realize fast lookup.

[1]  Philip S. Yu,et al.  A Load Shedding Framework and Optimizations for M-way Windowed Stream Joins , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[3]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.

[4]  Olga Papaemmanouil,et al.  SemCast: semantic multicast for content-based data dissemination , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[6]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[7]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[8]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[9]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[10]  Guruduth Banavar,et al.  An efficient multicast protocol for content-based publish-subscribe systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[11]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[13]  Lee Jae-Gil,et al.  Continuous Query Processing in Data Streams Using Duality of Data and Queries , 2006 .

[14]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[15]  Navendu Jain,et al.  Design, implementation, and evaluation of the linear road bnchmark on the stream processing core , 2006, SIGMOD Conference.

[16]  Philip S. Yu,et al.  Adaptive load shedding for windowed stream joins , 2005, CIKM '05.

[17]  Philip S. Yu,et al.  Challenges and Experience in Prototyping a Multi-Modal Stream Analytic and Monitoring Application on System S , 2007, VLDB.

[18]  Philip S. Yu,et al.  Query indexing with containment-encoded intervals for efficient stream processing , 2005, Knowledge and Information Systems.

[19]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[20]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[21]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[22]  Ling Liu,et al.  Quality-aware dstributed data delivery for continuous query services , 2006, SIGMOD Conference.