Time-sensitive computation of aggregate functions over distributed imprecise data

Summary form only given. Many distributed applications in the real world now require real time services in which aggregate queries need to be computed over a set of values. These applications can often tolerate varying degrees of inaccuracy in the results. System designers, on the other hand, would like to provide services with low inaccuracy and minimal management overhead. We focus on addressing the tradeoffs between timeliness, accuracy and cost for data aggregation in distributed environments. Specifically, we address the problem of time-sensitive computation of aggregate queries (count, sum and min) over a set of values represented by intervals with lower and upper bounds. These intervals are approximations based on most recent values about distributed sources. In order to meet the precision constraints from users, a subset of sources needs to be probed for exact values. We first propose algorithms for batch selection of the probing set, where selection is done before probing without the knowledge of the actual values. In addition, we propose an iterative selection approach where the selection of the next probing source depends on the previous returned value.

[1]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[2]  Satish K. Tripathi,et al.  Quality of service based routing: a performance perspective , 1998, SIGCOMM '98.

[3]  Oscar H. Ibarra,et al.  Fast Approximation Algorithms for the Knapsack and Sum of Subset Problems , 1975, JACM.

[4]  Rajeev Motwani,et al.  Computing shortest paths with uncertainty , 2003, J. Algorithms.

[5]  Sanjeev Khanna,et al.  On computing functions with uncertainty , 2001, PODS '01.

[6]  Jennifer Widom,et al.  Adaptive precision setting for cached approximate values , 2001, SIGMOD '01.

[7]  Sang Hyuk Son,et al.  A QoS-sensitive approach for timeliness and freshness guarantees in real-time databases , 2002, Proceedings 14th Euromicro Conference on Real-Time Systems. Euromicro RTS 2002.

[8]  Hector Garcia-Molina,et al.  Applying update streams in a soft real-time database system , 1995, SIGMOD '95.

[9]  Ouri Wolfson,et al.  Divergence caching in client-server architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[10]  Anindya Datta,et al.  Providing Real-Time Response, State Recency and Temporal Consistency in Databases for Rapidly Changing Environments , 1997 .

[11]  Venkatesan Guruswami,et al.  Query strategies for priced information (extended abstract) , 2000, STOC '00.

[12]  Jennifer Widom,et al.  Computing the median with uncertainty , 2000, STOC '00.

[13]  Qi Han,et al.  Addressing timeliness/accuracy/cost tradeoffs in information collection for dynamic environments , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[14]  Jennifer Widom,et al.  Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data , 2000, VLDB.

[15]  Sheng Ma,et al.  Using Adaptive Probing for Real-Time Problem Diagnosis in Distributed Computer Systems , 2002 .