Handling Non-linear Polynomial Queries over Dynamic Data

Applications that monitor functions over rapidly and unpredictably changing data, express their needs as continuous queries. Our focus is on a rich class of queries, expressed as polynomials over multiple data items. Given a set of polynomial queries at a coordinator C, and a user-specified accuracy bound (tolerable imprecision) for each query, we address the problem of assigning data accuracy bounds or filters to the source of each data item. Assigning data accuracy bounds for non-linear queries poses special challenges. Unlike linear queries, data accuracy bounds for non-linear queries depend on the current values of data items and hence need to be recomputed frequently. So, we seek an assignment such that a) if the value of each data item at C is within its data accuracy bound then the value of each query is also within its accuracy bound, b) the number of data refreshes sent by sources to C to meet the query accuracy bounds, is as low as possible, and c) the number of times the data accuracy bounds need to be recomputed is as low as possible. In this paper, we couple novel ideas with existing optimization techniques to derive such an assignment. Specifically, we make the following contributions: (i) Propose a novel technique that significantly reduces the number of times data accuracy bounds must be recomputed; (ii) Show that a small increase in the number of data refreshes can lead to a large reduction in the number of recomputations; we introduce this as a tradeoff in our approach; (iii) Give principled heuristics for addressing negative coefficient polynomial queries where no known optimization techniques can be used; we also prove that under many practically encountered conditions our heuristics can be close to optimal; and (iv) Present a detailed experimental evaluation demonstrating the efficacy of our techniques in handling large number of polynomial queries.

[1]  Jennifer Widom,et al.  Best-effort cache synchronization with source cooperation , 2002, SIGMOD '02.

[2]  Krithi Ramamritham,et al.  Executing incoherency bounded continuous queries at web data aggregators , 2005, WWW '05.

[3]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[4]  Prashant J. Shenoy,et al.  Implications of proxy caching for provisioning networks and servers , 2000, SIGMETRICS '00.

[5]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[6]  Prashant J. Shenoy,et al.  Resilient and coherence preserving dissemination of dynamic data using cooperating peers , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jun Yang,et al.  Constraint chaining: on energy-efficient continuous monitoring in sensor networks , 2006, SIGMOD Conference.

[8]  Edward Y. Chang,et al.  Adaptive stream resource management using Kalman Filters , 2004, SIGMOD '04.

[9]  Jennifer Widom,et al.  Adaptive precision setting for cached approximate values , 2001, SIGMOD '01.

[10]  Stephen P. Boyd,et al.  A tutorial on geometric programming , 2007, Optimization and Engineering.

[11]  Sunil Prabhakar,et al.  Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance , 2005, VLDB.

[12]  Graham Cormode,et al.  Streaming in a connected world: querying and tracking distributed data streams , 2008, EDBT '08.

[13]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[14]  Assaf Schuster,et al.  A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams , 2010, Ubiquitous Knowledge Discovery.