Optimized query planning of continuous aggregation queries in dynamic data dissemination networks

Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some aggregation function over distributed data items, for example, to know (a) the average of temperatures sensed by a set of sensors (b) the value of index of mid-cap stocks. In these queries a client specifies a coherency requirement as part of the query. In this paper we present a low-cost, scalable technique to answer continuous aggregation queries using a content distribution network of dynamic data items. In such a network of data aggregators, each data aggregator serves a set of data items at specific coherencies. Just as various fragments of a dynamic web-page are served by one or more nodes of a content distribution network, our technique involves decomposing a client query into sub-queries and executing sub-queries on judiciously chosen data aggregators with their individual sub-query incoherency bounds. We provide a technique of getting the optimal query plan (i.e., set of sub-queries and their chosen data aggregators) which satisfies client query.s coherency requirement with least cost, measured in terms of the number of refresh messages sent from aggregators to the client. For estimating query execution cost, we build a continuous query cost model which can be used to estimate the number of messages required to satisfy the client specified incoherency bound. Performance results using real-world traces show that our cost based query planning leads to queries being executed using less than one third the number of messages required by existing schemes.

[1]  RamamrithamKrithi,et al.  Proxy-based acceleration of dynamically generated content on the world wide web , 2004 .

[2]  William E. Weihl,et al.  Edgecomputing: extending enterprise applications to the edge of the internet , 2004, WWW Alt. '04.

[3]  Prashant J. Shenoy,et al.  Maintaining Coherency of Dynamic Data in Cooperating Repositories , 2002, VLDB.

[4]  Bruce M. Maggs,et al.  Globally Distributed Content Delivery , 2002, IEEE Internet Comput..

[5]  Krithi Ramamritham,et al.  Construction of a coherency preserving dynamic data dissemination network , 2004, 25th IEEE International Real-Time Systems Symposium.

[6]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[7]  Krithi Ramamritham,et al.  Executing incoherency bounded continuous queries at web data aggregators , 2005, WWW '05.

[8]  Wai Lam,et al.  Using a generalized instance set for automatic text categorization , 1998, SIGIR '98.

[9]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[10]  Wei Hong,et al.  Approximate Data Collection in Sensor Networks using Probabilistic Models , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Dorit S. Hochbaum,et al.  Approximation Algorithms for the Set Covering and Vertex Cover Problems , 1982, SIAM J. Comput..

[12]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[13]  Krithi Ramamritham,et al.  Proxy-based acceleration of dynamically generated content on the world wide web: An approach and implementation , 2004, ACM Trans. Database Syst..

[14]  Shanzhong Zhu,et al.  Stochastic Consistency, and Scalable Pull-Based Caching for Erratic Data Sources , 2004, VLDB.

[15]  S. Shweta,et al.  Construction of a Temporal Coherency Preserving Dynamic Data Dissemination Network , 2004 .

[16]  Zongming Fei,et al.  A Novel Approach to Managing Consistency in Content Distribution Networks , 2001 .

[17]  D. Sheskin The Pearson Product-Moment Correlation Coefficient , 2003 .

[18]  Pablo Rodriguez,et al.  User Specific Request Redirection in a Content Delivery Network , 2003, WCW.

[19]  Chinya V. Ravishankar,et al.  Client Assignment in Content Dissemination Networks for Dynamic Data , 2005, VLDB.