Approximate temporal aggregation

Temporal aggregate queries retrieve summarized information about records with time-evolving attributes. Existing approaches have at least one of the following shortcomings: (i) they incur large space requirements, (ii) they have high processing cost and (iii) they are based on complex structures, which are not available in commercial systems. We solve these problems by approximation techniques with bounded error. We propose two methods: the first one is based on multiversion B-trees and has logarithmic worst-case query cost, while the second technique uses off-the-shelf B- and R-trees, and achieves the same performance in the expected case. We experimentally demonstrate that the proposed methods consume an order of magnitude less space than their competitors and are significantly faster, even for cases that the permissible error bound is very small.

[1]  Rakesh M. Verma,et al.  An Efficient Multiversion Access STructure , 1997, IEEE Trans. Knowl. Data Eng..

[2]  Sudipto Guha,et al.  Dynamic multidimensional histograms , 2002, SIGMOD '02.

[3]  Dimitrios Gunopulos,et al.  Efficient aggregation over objects with extent , 2002, PODS '02.

[4]  Dimitrios Gunopulos,et al.  Efficient computation of temporal aggregates with range predicates , 2001, PODS '01.

[5]  Divyakant Agrawal,et al.  Constrained Nearest Neighbor Queries , 2001, Encyclopedia of GIS.

[6]  Christos Faloutsos,et al.  On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[7]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[8]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2003, The VLDB Journal.

[9]  Christian Böhm,et al.  A cost model for query processing in high dimensional data spaces , 2000, TODS.

[10]  Nick Roussopoulos,et al.  Adaptive selectivity estimation using query feedback , 1994, SIGMOD '94.

[11]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[12]  Yufei Tao,et al.  Aggregate Processing of Planar Points , 2002, EDBT.

[13]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[14]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[15]  Rajeev Motwani,et al.  Overcoming limitations of sampling for aggregation queries , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Yufei Tao,et al.  Cost models for overlapping and multiversion structures , 2002, TODS.

[17]  Vassilis J. Tsotras,et al.  A Comparison of Access Methods for Temporal Data Title a Comparison of Access Methods for Temporal Data Individual Participants , 2022 .

[18]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[19]  Pankaj K. Agarwal,et al.  CRB-Tree: An Efficient Indexing Scheme for Range-Aggregate Queries , 2003, ICDT.

[20]  Jimeng Sun,et al.  Selectivity estimation for predictive spatio-temporal queries , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[21]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[22]  Panos Kalnis,et al.  Indexing spatio-temporal data warehouses , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Rakesh M. Verma,et al.  Optimal Storage and Access to Multiversion Data , 1997 .

[24]  Christian Bohm,et al.  A cost model for query processing in high dimensional data spaces , 2000 .

[25]  Dimitrios Gunopulos,et al.  Temporal and spatio-temporal aggregations over data streams using multiple time granularities , 2003, Inf. Syst..