Optimization of Continuous Queries in Federated Database and Stream Processing Systems

The constantly increasing number of connected devices and sensors results in increasing volume and velocity of sensor-based streaming data. Traditional approaches for processing high velocity sensor data rely on stream processing engines. However, the increasing complexity of continuous queries executed on top of high velocity data has resulted in growing demand for federated systems composed of data stream processing engines and database engines. One of major challenges for such systems is to devise the optimal query execution plan to maximize the throughput of continuous queries. In this paper we present a general framework for federated database and stream processing systems, and introduce the design and implementation of a cost-based optimizer for optimizing relational continuous queries in such systems. Our optimizer uses characteristics of continuous queries and source data streams to devise an optimal placement for each operator of a continuous query. This fine level of optimization, combined with the estimation of the feasibility of query plans, allows our optimizer to devise query plans which result in 8 times higher throughput as compared to the baseline approach which uses only stream processing engines. Moreover, our experimental results showed that even for simple queries, a hybrid execution plan can result in 4 times and 1.6 times higher throughput than a pure stream processing engine plan and a pure database engine plan, respectively.

[1]  Ying Li,et al.  Placement Strategies for Internet-Scale Data Stream Systems , 2008, IEEE Internet Computing.

[2]  Klaus Meyer-Wegener,et al.  Efficient and cost-aware operator placement in heterogeneous stream-processing environments , 2011, DEBS '11.

[3]  Bernhard Seeger,et al.  A Cost-Based Approach to Adaptive Resource Management in Data Stream Systems , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  Holger Ziekow,et al.  The DEBS 2014 grand challenge , 2014, DEBS '14.

[5]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[6]  Michael J. Franklin,et al.  Continuous Analytics: Rethinking Query Processing in a Network-Effect World , 2009, CIDR.

[7]  Wolfgang Lehner,et al.  SAP HANA: The Evolution from a Modern Main-Memory Data Platform to an Enterprise Application Platform , 2013, Proc. VLDB Endow..

[8]  Laura M. Haas,et al.  SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems , 2010, Proc. VLDB Endow..

[9]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.

[10]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[11]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[12]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[13]  Yuanzhen Ji Database support for processing complex aggregate queries over data streams , 2013, EDBT '13.

[14]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[15]  Jin Zhang,et al.  A demonstration of the MaxStream federated stream processing system , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  José A. Blakeley,et al.  Distributed/heterogeneous query processing in Microsoft SQL server , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Joseph M. Hellerstein,et al.  Decoupled query optimization for federated database systems , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[19]  Martin Kersten,et al.  Exploiting the power of relational databases for efficient stream processing , 2009, EDBT '09.

[20]  Bernhard Seeger,et al.  PIPES: a public infrastructure for processing and exploring streams , 2004, SIGMOD '04.

[21]  Rajeev Motwani,et al.  Optimization Algorithms for Exploiting the Parallelism-Communication Tradeoff in Pipelined Parallelism , 1994, VLDB.

[22]  Sudipto Guha,et al.  SmartCIS: integrating digital and physical environments , 2009, SIGMOD Conference.

[23]  JÜRGEN KRÄMER,et al.  Semantics and implementation of continuous sliding window queries over data streams , 2009, TODS.

[24]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[25]  Qiming Chen,et al.  Experience in Extending Query Engine for Continuous Analytics , 2010, DaWak.

[26]  Timos K. Sellis,et al.  Window Specification over Data Streams , 2006, EDBT Workshops.

[27]  Shivnath Babu,et al.  How to Fit when No One Size Fits , 2013, CIDR.

[28]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[29]  Walid G. Aref,et al.  Incremental Evaluation of Sliding-Window Queries over Data Streams , 2007 .