Scalability via summaries: Stream Query Processing Using Promising Tuples

Abstract In many data st.reaming applications. streamsmay contain data tuples that are either re­dundant. repetitive, or that are not "interest­ing" to any ofthestandingcontinuous queries.Processing such tuples may waste s~'stem re­sources \\'ithoutproducing useful answers. Tothe contrary, some other tuples can be catego­rized as promi8ing. This paper proposes thatstream query engines can have the option toexecute on promising tuples only and not onall tuples. 'Ve propose to maintain interme­diate stream summaries and indices that candirect the stream query engine to detect andoperate on promising tuples. As an illustra­tion. the proposed intermediate stream sum­maries are tuned towards capturing promisingtuples that (1) maximize the number of out­puttuples. (2) contribute toproducing a faith­ful representative sample of the output tuples(compared to the output produced when as­suming infinite resources), or (3) produce theoutlier or deviant results. Experiments areconducted in the context of Nile [24]. a pro­totype stream query processing engine devel­oped at Purdue Unil

[1]  Rajeev Motwani,et al.  On random sampling over joins , 1999, SIGMOD '99.

[2]  Jennifer Widom,et al.  Query Processing, Resource Management, and Approximation ina Data Stream Management System , 2002 .

[3]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[4]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[5]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[6]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[7]  Philippe Bonnet,et al.  Towards Sensor Database Systems , 2001, Mobile Data Management.

[8]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[9]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.

[10]  Abhinandan Das,et al.  Semantic approximation of data stream joins , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jennifer Widom,et al.  Memory-Limited Execution of Windowed Stream Joins , 2004, VLDB.

[12]  Walid G. Aref,et al.  Scheduling for shared window joins over data streams , 2003, VLDB.

[13]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Walid G. Aref,et al.  Nile: a query processing engine for data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[16]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[17]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[18]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[19]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[21]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[22]  Viswanath Poosala,et al.  Congressional samples for approximate answering of group-by queries , 2000, SIGMOD '00.

[23]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[24]  Lukasz Golab,et al.  Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams , 2003, VLDB.

[25]  Dimitrios Gunopulos,et al.  Temporal Aggregation over Data Streams Using Multiple Granularities , 2002, EDBT.

[26]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[27]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[28]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[29]  Walid G. Aref,et al.  Stream window join: tracking moving objects in sensor-network databases , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[30]  Sridhar Ramaswamy,et al.  Join synopses for approximate query answering , 1999, SIGMOD '99.

[31]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[32]  Peter M. G. Apers,et al.  Pipelining in query execution , 1990, Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications.

[33]  Theodosios Pavlidis,et al.  A hierarchical data structure for picture processing , 1975 .

[34]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2003, The VLDB Journal.

[35]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[36]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[37]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.