Fixed-precision approximate continuous aggregate queries in peer-to-peer databases

In this paper, we propose an efficient sample-based approach to answer fixed-precision approximate continuous aggregate queries in peer-to-peer databases. First, we define practical semantics to formulate fixed-precision approximate continuous aggregate queries. Second, we propose “Digest”, a two-tier system for correct and efficient query answering by sampling. At the top tier, we develop a query evaluation engine that uses the samples collected from the peer-to-peer database to continually estimate the running result of the approximate continuous aggregate query with guaranteed precision. For efficient query evaluation, we propose an extrapolation algorithm that predicts the evolution of the running result and adapts the frequency of the continual sampling occasions accordingly to avoid redundant samples. We also introduce a repeated sampling algorithm that draws on the correlation between the samples at successive sampling occasions and exploits linear regression to minimize the number of the samples derived at each occasion. At the bottom tier, we introduce a distributed sampling algorithm for random sampling (uniform and nonuniform) from peer-to-peer databases with arbitrary network topology and tuple distribution. Our sampling algorithm is based on the Metropolis Markov Chain Monte Carlo method that guarantees randomness of the sample with arbitrary small variation difference with the desired distribution, while it is comparable to optimal sampling in sampling cost/time. We evaluate the efficiency of Digest via simulation using real data.

[1]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[2]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[3]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[4]  Mong-Li Lee,et al.  ICICLES: Self-Tuning Samples for Approximate Query Answering , 2000, VLDB.

[5]  Jeffrey F. Naughton,et al.  Practical selectivity estimation through adaptive sampling , 1990, SIGMOD '90.

[6]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[7]  Raghu Ramakrishnan,et al.  Dynamic Histograms: Capturing Evolving Data Sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Jeffrey Considine,et al.  Approximate aggregation techniques for sensor databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[9]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[10]  P. Diaconis,et al.  Geometric Bounds for Eigenvalues of Markov Chains , 1991 .

[11]  A. Winsor Sampling techniques. , 2000, Nursing times.

[12]  Edith Cohen,et al.  Spatially-decaying aggregation over a network: model and algorithms , 2004, SIGMOD '04.

[13]  Persi Diaconis,et al.  What do we know about the Metropolis algorithm? , 1995, STOC '95.

[14]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[15]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[16]  Farnoush Banaei-Kashani,et al.  Fixed-precision approximate continuous aggregate queries in peer-to-peer databases , 2010, CollaborateCom 2010.

[17]  Dimitrios Gunopulos,et al.  Approximating Aggregation Queries in Peer-to-Peer Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  David J. DeWitt,et al.  Scalable Distributed Aggregate Computations through Collaboration in Peer-to-Peer Systems , 1970 .

[19]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[20]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[21]  M - Estimating Aggregates on a Peer-to-Peer Network , 2003 .

[22]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[23]  Wen-Chi Hou,et al.  Statistical estimators for aggregate relational algebra queries , 1991, TODS.

[24]  Nick Roussopoulos,et al.  Adaptive selectivity estimation using query feedback , 1994, SIGMOD '94.

[25]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[26]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[27]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[28]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[29]  Viswanath Poosala,et al.  Congressional Samples for Approximate Answering of Group-By Queries , 2000, SIGMOD Conference.

[30]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[31]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[32]  Peter J. Haas,et al.  Sequential sampling procedures for query size estimation , 1992, SIGMOD '92.