Parsimonious temporal aggregation

Temporal aggregation is a crucial operator in temporal databases and has been studied in various flavors, including instant temporal aggregation (ITA) and span temporal aggregation (STA), each having its strengths and weaknesses. In this paper we define a new temporal aggregation operator, called parsimonious temporal aggregation (PTA), which comprises two main steps: (i) it computes the ITA result over the input relation and (ii) it compresses this intermediate result to a user-specified size c by merging adjacent tuples and keeping the induced total error minimal; the compressed ITA result is returned as the final result. By considering the distribution of the input data and allowing to control the result size, PTA combines the best features of ITA and STA. We provide two evaluation algorithms for PTA queries. First, the oPTA algorithm computes an exact solution, by applying dynamic programming to explore all possibilities to compress the ITA result and selecting the compression with the minimal total error. It runs in O(n2pc) time and O(n2) space, where n is the size of the input relation and p is the number of aggregation functions in the query. Second, the more efficient gPTA algorithm computes an approximate solution by greedily merging the most similar ITA result tuples, which, however, does not guarantee a compression with a minimal total error. gPTA intermingles the two steps of PTA and avoids large intermediate results. The compression step of gPTA runs in O(np log(c + Δ)) time and O(c + Δ) space, where Δ is a small buffer for "look ahead". An empirical evaluation shows good results: considerable reductions of the result size introduce only small errors, and gPTA scales to large data sets and is only slightly worse than the exact solution of PTA.

[1]  Christian S. Jensen,et al.  Multi-dimensional Aggregation for Temporal Data , 2006, EDBT.

[2]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2003, The VLDB Journal.

[3]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[4]  Philip S. Yu,et al.  HierarchyScan: a hierarchical similarity search algorithm for databases of long sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[5]  Changzhou Wang,et al.  Supporting content-based searches on time series via approximation , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[6]  Richard T. Snodgrass,et al.  Coalescing in Temporal Databases , 1996, VLDB.

[7]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[8]  Michael H. Böhlen,et al.  A Greedy Approach Towards Parsimonious Temporal Aggregation , 2008, 2008 15th International Symposium on Temporal Representation and Reasoning.

[9]  Dimitrios Gunopulos,et al.  Streaming Time Series Summarization Using User-Defined Amnesic Functions , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  Richard T. Snodgrass,et al.  Spatiotemporal aggregate computation: a survey , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Christos Faloutsos,et al.  Approximate temporal aggregation , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[13]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[14]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[15]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Shamkant B. Navathe,et al.  A Temporal Relational Model and a Query Language , 1989, Inf. Sci..

[17]  Walid G. Aref,et al.  Online Piece-wise Linear Approximation of Numerical Streams with Precision Guarantees , 2009, Proc. VLDB Endow..

[18]  Eamonn J. Keogh,et al.  A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases , 2000, PAKDD.

[19]  Richard T. Snodgrass,et al.  Aggregates in the Temporal Query Language TQuel , 1993, IEEE Trans. Knowl. Data Eng..

[20]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[21]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[22]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[23]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[24]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[25]  Richard T. Snodgrass,et al.  Computing temporal aggregates , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[26]  David Salesin,et al.  Wavelets for computer graphics: a primer.1 , 1995, IEEE Computer Graphics and Applications.

[27]  Bongki Moon,et al.  Efficient Algorithms for Large-Scale Temporal Aggregation , 2003, IEEE Trans. Knowl. Data Eng..

[28]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[29]  Gerhard Weikum,et al.  A Time Machine for Text Search , 2022 .