Distributed Adaptive Windowed Stream Join Processing

This paper presents an adaptive framework for processing a window-based multi-way join query over distributed data streams. The framework integrates distributed plan modification and distributed plan migration within the same scope by using a building block called the node operator set NOS. An NOS is housed in each node that participates in the join execution, and specifies the set of atomic operations to be performed locally at the host node to execute its share of the global execution plan. The plan modification and migration techniques presented are for the case of updating the NOSs centralized at a single node and the case of updating them distributed at each node. The plan modification is triggered by the change of stream statistics and adjusts the join execution order and placement greedily to satisfy a cost invariant. The plan migration uses the distributed track strategy to accelerate the migration of window extents to new nodes. The migration of all window extents is synchronized. Experiments confirm the effectiveness of the developed adaptive framework on reducing the join execution cost and indicate a small additional adaptation-overhead for distributing the NOS update.

[1]  Brian F. Cooper,et al.  Optimizing Multiple Queries in Distributed Data Stream Systems , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[2]  Joseph M. Hellerstein,et al.  Lifting the Burden of History from Adaptive Query Processing , 2004, VLDB.

[3]  Theodora Varvarigou,et al.  Achieving Real-Time in Distributed Computing - From Grids to Clouds , 2012, Achieving Real-Time in Distributed Computing.

[4]  Kajal T. Claypool,et al.  Teddies: Trained Eddies for Reactive Stream Processing , 2008, DASFAA.

[5]  Emmanuel Udoh,et al.  Cloud, Grid and High Performance Computing: Emerging Applications , 2011 .

[6]  David J. DeWitt,et al.  Tuple Routing Strategies for Distributed Eddies , 2003, VLDB.

[7]  Yongluan Zhou,et al.  Adaptive Distributed Query Processing , 2003, VLDB PhD Workshop.

[8]  Jeffrey F. Naughton,et al.  Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources , 2003, VLDB.

[9]  Graham Cormode,et al.  What’s Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Philip S. Yu,et al.  A Load Shedding Framework and Optimizations for M-way Windowed Stream Joins , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[12]  Navendu Jain,et al.  Adaptive Control of Extreme-scale Stream Processing Systems , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[13]  Alfred V. Aho,et al.  Efficient optimization of a class of relational expressions , 1979, ACM Trans. Database Syst..

[14]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[15]  JingTao Yao,et al.  Novel Developments in Granular Computing: Applications for Advanced Human Reasoning and Soft Computation , 2010 .

[16]  Jennifer Widom,et al.  Operator placement for in-network stream query processing , 2005, PODS.

[17]  Karsten Schwan,et al.  Resource-Aware Distributed Stream Management Using Dynamic Overlays , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[18]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[20]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[21]  David J. DeWitt,et al.  Proactive re-optimization , 2005, SIGMOD '05.

[22]  Yin Yang,et al.  HybMig: A Hybrid Approach to Dynamic Plan Migration for Continuous Queries , 2007, IEEE Transactions on Knowledge and Data Engineering.

[23]  Gabriel Antoniu,et al.  Towards a Generic Security Framework for Cloud Data Management Environments , 2012, Int. J. Distributed Syst. Technol..

[24]  Peter Scheuermann,et al.  Adaptive Algorithms for Join Processing in Distributed Database Systems , 1997, Distributed and Parallel Databases.

[25]  Lukasz Golab,et al.  Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams , 2003, VLDB.

[26]  Assaf Schuster,et al.  A geometric approach to monitoring threshold functions over distributed data streams , 2006, Ubiquitous Knowledge Discovery.

[27]  Andreas Menychtas,et al.  Web Service Specifications Relevant for Service Oriented Infrastructures , 2012, Achieving Real-Time in Distributed Computing.

[28]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD 2000.

[29]  Karsten Schwan,et al.  Distributed Stream Management using Utility-Driven Self-Adaptive Middleware , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[30]  Feng Yu,et al.  PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning , 2006, DASFAA.

[31]  Elke A. Rundensteiner,et al.  Dynamic plan migration for continuous queries over data streams , 2004, SIGMOD '04.

[32]  Parveen Kumar,et al.  Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems , 2011, Int. J. Distributed Syst. Technol..

[33]  Abhinandan Das,et al.  Distributed Set Expression Cardinality Estimation , 2004, VLDB.

[34]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[35]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[36]  Alexandros Stamatakis,et al.  Large-Scale Co-Phylogenetic Analysis on the Grid , 2009, Int. J. Grid High Perform. Comput..

[37]  Byung Suk Lee,et al.  Distributed stream join query processing with semijoins , 2010, Distributed and Parallel Databases.

[38]  Jennifer Widom,et al.  Adaptive caching for continuous queries , 2005, 21st International Conference on Data Engineering (ICDE'05).

[39]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[40]  Stefano Ceri,et al.  Distributed Databases: Principles and Systems , 1984 .