A Distributed Stream Query Optimization Framework through Integrated Planning and Deployment

This paper addresses the problem of optimizing multiple distributed stream queries that are executing simultaneously in distributed data stream systems. We argue that the static query optimization approach of "plan, then deployment" is inadequate for handling distributed queries involving multiple streams and node dynamics faced in distributed data stream systems and applications. Thus, the selection of an optimal execution plan in such dynamic and networked computing systems must consider operator ordering, reuse, network placement, and search space reduction. We propose to use hierarchical network partitions to exploit various opportunities for operator-level reuse while utilizing network characteristics to maintain a manageable search space during query planning and deployment. We develop top-down, bottom-up, and hybrid algorithms for exploiting operator-level reuse through hierarchical network partitions. Formal analysis is presented to establish the bounds on the search space and suboptimality of our algorithms. We have implemented our algorithms in the IFLOW system, an adaptive distributed stream management system. Through simulations and experiments using a prototype deployed on Emulab, we demonstrate the effectiveness of our framework and our algorithms.

[1]  Kian-Lee Tan,et al.  Two-Tier Multiple Query Optimization for Sensor Networks , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[2]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[3]  Tarek F. Abdelzaher,et al.  EnviroMic: Towards Cooperative Storage and Retrieval in Audio Sensor Networks , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[4]  David J. DeWitt,et al.  Design and evaluation of alternative selection placement strategies in optimizing continuous queries , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[6]  Robert Tappan Morris,et al.  Practical, distributed network coordinates , 2004, Comput. Commun. Rev..

[7]  Calton Pu,et al.  Operational information systems: an example from the airline industry , 2000, WIESS'00.

[8]  Jennifer Widom,et al.  Operator placement for in-network stream query processing , 2005, PODS.

[9]  Yong Yao,et al.  The cougar approach to in-network query processing in sensor networks , 2002, SGMD.

[10]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[11]  Karsten Schwan,et al.  Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW , 2006, 2006 IEEE International Conference on Autonomic Computing.

[12]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[13]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[14]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[15]  Michael Stonebraker The INGRES Papers: Anatomy of a Relational Database System , 1986 .

[16]  Karsten Schwan,et al.  Dynamic Querying of Streaming Data with the dQUOB System , 2003, IEEE Trans. Parallel Distributed Syst..

[17]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[18]  Michael Stonebraker,et al.  Concurrency Control and Consistency of Multiple Copies of Data in Distributed Ingres , 1979, IEEE Transactions on Software Engineering.

[19]  Liang Chen,et al.  GATES: a grid-based middleware for processing distributed data streams , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[20]  David Powell Overview of the Architecture , 1991 .

[21]  Rajeev Rastogi,et al.  Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[22]  Anthony Mezzacappa,et al.  TeraScale Supernova Initiative , 2002 .

[23]  John Moy,et al.  OSPF Version 2 , 1998, RFC.

[24]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[25]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[26]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Karsten Schwan,et al.  Event services for high performance computing , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[28]  Michael Stonebraker,et al.  The Design and Implementation of Distributed INGRES , 1986, The INGRES Papers.

[29]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[30]  Mohamed A. Sharaf,et al.  Location-Aware Routing for Data Aggregation in Sensor Networks1 , 2004 .

[31]  Dean Daniels,et al.  R*: An Overview of the Architecture , 1986, JCDKB.

[32]  Karsten Schwan,et al.  IQ-Paths: Self-regulating Data Streams across Network Overlays , 2006 .

[33]  Ramesh Govindan,et al.  MIND: A Distributed Multi-Dimensional Indexing System for Network Diagnosis , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[34]  Ellen W. Zegura,et al.  How to model an internetwork , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.