Incremental Aggregation on Multiple Continuous Queries

Continuously monitoring large-scale aggregates over data streams is important for many stream processing applications, e.g. collaborative intelligence analysis, and presents new challenges to data management systems. The first challenge is to efficiently generate the updated aggregate values and provide the new results to users after new tuples arrive. We implemented an incremental aggregation mechanism for doing so for arbitrary algebraic aggregate functions including user-defined ones by keeping up-to-date finite data summaries. The second challenge is to construct shared query evaluation plans to support large-scale queries effectively. Since multiple query optimization is NP-complete and the queries generally arrive asynchronously, we apply an incremental sharing approach to obtain the shared plans that perform reasonably well. The system is built as a part of ARGUS, a stream processing system atop of a DBMS. The evaluation study shows that our approaches are effective and efficient on typical collaborative intelligence analysis data and queries.

[1]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[2]  Jaime G. Carbonell,et al.  ARGUS: Rete + DBMS = Efficient Persistent Profile Matching on Large-Volume Data Streams , 2005, ISMIS.

[3]  Zhimin Chen,et al.  Efficient computation of multiple group by queries , 2005, SIGMOD '05.

[4]  Jingren Zhou,et al.  Stacked indexed views in microsoft SQL server , 2005, SIGMOD '05.

[5]  Graham Cormode,et al.  Holistic aggregates in a networked world: distributed tracking of approximate quantiles , 2005, SIGMOD '05.

[6]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[7]  Per-Åke Larson,et al.  Updating derived relations: detecting irrelevant and autonomously computable updates , 1986, VLDB.

[8]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[9]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[10]  H. V. Jagadish,et al.  Data Integration using Self-Maintainable Views , 1996, EDBT.

[11]  Guido Moerkotte,et al.  On the complexity of generating optimal plans with cross products (extended abstract) , 1997, PODS '97.

[12]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[13]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[14]  Timos K. Sellis,et al.  On the Multiple-Query Optimization Problem , 1990, IEEE Trans. Knowl. Data Eng..

[15]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[16]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[17]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[18]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[19]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[20]  David Wai-Lok Cheung,et al.  Mining periodic patterns with gap requirement from sequences , 2005, SIGMOD '05.