QUAY: a data stream processing system using chunking

Data stream processing has emerged as a recent research direction focusing on new generation database applications, in which data records from remote source sites flow continuously to a processing site. Queries residing in the processing site are triggered and evaluated upon the arrival of their interested data records. There are two important aspects that distinguish data stream processing systems from conventional database systems. First, the roles of queries and data records are swapped; queries are stationary while data records are dynamic. Query indexing becomes an essential performance determining issue. Second, the expectedly high data flow rate aggravates data index maintenance overheads. To address the problems thus arisen, we propose and develop a data stream processing system called QUAY. We present the design, implementation and evaluation of QUAY. The core technique that we use is "chunking" which clusters and indexes both queries and data records in a unified way as chunks. To process window join operation from stream sources, we propose an adaptive selection-join arrangement for a huge number of selection-join queries to share expensive join operations. Through a set of intensive performance evaluation experiments, we show that the chunking organization, operating under our proposed adaptive selection-join arrangement, yields desirably good performance.

[1]  Ken C. K. Lee,et al.  Incremental maintenance for dynamic database-derived HTML pages in digital libraries , 1998, CIKM '98.

[2]  Michael J. Franklin,et al.  Streaming Queries over Streaming Data , 2002, VLDB.

[3]  Arie Segev,et al.  Optimization of join operations in horizontally partitioned database systems , 1986, TODS.

[4]  Joseph M. Hellerstein,et al.  Optimization techniques for queries with expensive methods , 1998, TODS.

[5]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[6]  Ken C. K. Lee,et al.  Semantic Data Broadcast for a Mobile Environment Based on Dynamic and Adaptive Chunking , 2002, IEEE Trans. Computers.

[7]  Hanan Samet,et al.  Region representation: quadtrees from boundary codes , 1980, CACM.

[8]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[9]  Calton Pu,et al.  Differential evaluation of continual queries , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[10]  Oscar Díaz Deriving Active Rules for Constraint Maintenance in an Object-Oriented Database , 1992, DEXA.

[11]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[12]  Walid G. Aref,et al.  Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects , 2002, IEEE Trans. Computers.

[13]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.