Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators

Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some aggregation function over distributed data items, for example, to know value of portfolio for a client; or the AVG of temperatures sensed by a set of sensors. In these queries a client specifies a coherency requirement as part of the query. We present a low-cost, scalable technique to answer continuous aggregation queries using a network of aggregators of dynamic data items. In such a network of data aggregators, each data aggregator serves a set of data items at specific coherencies. Just as various fragments of a dynamic webpage are served by one or more nodes of a content distribution network, our technique involves decomposing a client query into subqueries and executing subqueries on judiciously chosen data aggregators with their individual subquery incoherency bounds. We provide a technique for getting the optimal set of subqueries with their incoherency bounds which satisfies client query's coherency requirement with least number of refresh messages sent from aggregators to the client. For estimating the number of refresh messages, we build a query cost model which can be used to estimate the number of messages required to satisfy the client specified incoherency bound. Performance results using real-world traces show that our cost-based query planning leads to queries being executed using less than one third the number of messages required by existing schemes.

[1]  Krithi Ramamritham,et al.  Executing incoherency bounded continuous queries at web data aggregators , 2005, WWW '05.

[2]  Wei Hong,et al.  Approximate Data Collection in Sensor Networks using Probabilistic Models , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  RamamrithamKrithi,et al.  Proxy-based acceleration of dynamically generated content on the world wide web , 2004 .

[4]  Dorit S. Hochbaum,et al.  Approximation Algorithms for the Set Covering and Vertex Cover Problems , 1982, SIAM J. Comput..

[5]  Krithi Ramamritham,et al.  Proxy-based acceleration of dynamically generated content on the world wide web: An approach and implementation , 2004, ACM Trans. Database Syst..

[6]  S. Shweta,et al.  Construction of a Temporal Coherency Preserving Dynamic Data Dissemination Network , 2004 .

[7]  Beng Chin Ooi,et al.  Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach , 2008, The VLDB Journal.

[8]  Shanzhong Zhu,et al.  Stochastic Consistency, and Scalable Pull-Based Caching for Erratic Data Sources , 2004, VLDB.

[9]  Nick Roussopoulos,et al.  Processing approximate aggregate queries in wireless sensor networks , 2006, Inf. Syst..

[10]  Bruce M. Maggs,et al.  Globally Distributed Content Delivery , 2002, IEEE Internet Comput..

[11]  Krithi Ramamritham,et al.  Construction of a coherency preserving dynamic data dissemination network , 2004, 25th IEEE International Real-Time Systems Symposium.

[12]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[13]  Kamesh Munagala,et al.  Energy-efficient monitoring of extreme values in sensor networks , 2006, SIGMOD Conference.

[14]  Yin Zhang,et al.  STAR: Self-Tuning Aggregation for Scalable Monitoring , 2007, VLDB.

[15]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[16]  Chinya V. Ravishankar,et al.  Client Assignment in Content Dissemination Networks for Dynamic Data , 2005, VLDB.

[17]  Krithi Ramamritham,et al.  Optimized query planning of continuous aggregation queries in dynamic data dissemination networks , 2007, WWW '07.

[18]  Pablo Rodriguez,et al.  User Specific Request Redirection in a Content Delivery Network , 2003, WCW.

[19]  Krithi Ramamritham,et al.  Asynchronous in-network prediction: Efficient aggregation in sensor networks , 2008, TOSN.

[20]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[21]  William E. Weihl,et al.  Edgecomputing: extending enterprise applications to the edge of the internet , 2004, WWW Alt. '04.

[22]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[23]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[24]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[25]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[26]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[27]  Pushpraj Shukla,et al.  Efficient Constraint Monitoring Using Adaptive Thresholds , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[28]  Prashant J. Shenoy,et al.  Maintaining Coherency of Dynamic Data in Cooperating Repositories , 2002, VLDB.