F1: Accelerating the Optimization of Aggregate Continuous Queries

Data Stream Management Systems performing on-line analytics rely on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). The state-of-the-art WeaveShare optimizer uses the Weavability concept in order to selectively combine ACQs for partial aggregation and produce high quality execution plans. However, WeaveShare does not scale well with the number of ACQs. In this paper we propose a novel closed formula, F1, that accelerates Weavability calculations, and thus allows WeaveShare to achieve exceptional scalability in systems with heavy workloads. In general, F1 can reduce the computation time of any technique that combines partial aggregations within composite slides of multiple ACQs. We theoretically analyze the Bit Set approach currently used by WeaveShare and show that F1 is superior in both time and space complexities. We show that F1 performs 1062 times less operations compared to Bit Set to produce the same execution plan for the same input. We experimentally show that F1 executes up to 60,000 times faster and can handle 1,000,000 ACQs in a setting where the limit for the current technique is 550.

[1]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[2]  Mohamed A. Sharaf,et al.  Three-Level Processing of Multiple Aggregate Continuous Queries , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[3]  Mohamed A. Sharaf,et al.  Optimized processing of multiple aggregate continuous queries , 2011, CIKM '11.

[4]  Rajeev Rastogi,et al.  Memory-constrained aggregate computation over data streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[5]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[6]  Walid G. Aref,et al.  Incremental Evaluation of Sliding-Window Queries over Data Streams , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[8]  Alexandros Labrinidis,et al.  CE-Storm: Confidential Elastic Processing of Data Streams , 2015, SIGMOD Conference.

[9]  Th. Motzkin The Euclidean algorithm , 1949 .

[10]  Dennis Clark,et al.  The Prime Number Theorem , 2002 .

[11]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[12]  K. S. Snell,et al.  THE BINOMIAL SERIES , 1966 .

[13]  Michael J. Franklin,et al.  On-the-fly sharing for streamed aggregation , 2006, SIGMOD Conference.

[14]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[15]  Joseph M. Hellerstein,et al.  Online aggregation and continuous query support in MapReduce , 2010, SIGMOD Conference.

[16]  Alexandros Labrinidis,et al.  Processing of Aggregate Continuous Queries in a Distributed Environment , 2015, BIRTE.

[17]  Christine Chung,et al.  Competitive Cost-Savings in Data Stream Management Systems , 2014, COCOON.

[18]  Maik Becker-Sievert Prime Number Theorem , 2014 .

[19]  Beng Chin Ooi,et al.  Multiple aggregations over data streams , 2005, SIGMOD '05.

[20]  Kun-Lung Wu,et al.  General Incremental Sliding-Window Aggregation , 2015, Proc. VLDB Endow..

[21]  Beng Chin Ooi,et al.  Streaming multiple aggregations using phantoms , 2010, The VLDB Journal.

[22]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[23]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[24]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.