SQPR: Stream query planning with reuse

When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.

[1]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[2]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[3]  Frederick Reiss,et al.  HiFi: A Unified Architecture for High Fan-in Systems , 2004, VLDB.

[4]  Karl Aberer,et al.  Toward Massive Query Optimization in Large-Scale Distributed Stream Systems , 2008, Middleware.

[5]  Olga Papaemmanouil,et al.  Supporting Generic Cost Models for Wide-Area Stream Processing , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Theodore Johnson,et al.  The Gigascope Stream Database , 2003, IEEE Data Eng. Bull..

[7]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[8]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[9]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OSDI '02.

[10]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Cathy H. Xia,et al.  Load shedding and distributed resource control of stream processing networks , 2007, Perform. Evaluation.

[12]  Liang Chen,et al.  GATES: a grid-based middleware for processing distributed data streams , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[13]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[14]  Stanley B. Zdonik,et al.  Network Awareness in Internet-Scale Stream Processing , 2005, IEEE Data Eng. Bull..

[15]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Ugur Çetintemel,et al.  Plan-based complex event detection across distributed sources , 2008, Proc. VLDB Endow..

[17]  Geetika T. Lakshmanan,et al.  Biologically-Inspired Distributed Middleware Management for Stream Processing Systems , 2008, Middleware.

[18]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[19]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[20]  Xiaohui Gu,et al.  Synergy: Sharing-Aware Component Composition for Distributed Stream Processing Systems , 2006, Middleware.

[21]  Brian F. Cooper,et al.  Optimizing Multiple Queries in Distributed Data Stream Systems , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[22]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[23]  Ying Li,et al.  Placement Strategies for Internet-Scale Data Stream Systems , 2008, IEEE Internet Computing.

[24]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[25]  Kun-Lung Wu,et al.  SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems , 2008, Middleware.