TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data

We introduce a novel wavelet based tree structure, termed TSA-tree, which improves the efficiency of multi-level trend and surprise queries on time sequence data. With the explosion of scientific observation data conceptualized as time sequences, we are facing the challenge of efficiently storing, retrieving and analyzing this data. Frequent queries on this data set are to find trends (e.g., global warming) or surprises (e.g., undersea volcano eruption) within the original time series. The challenge, however is that these trend and surprise queries are needed at different levels of abstractions. To support these multi-level trend and surprise queries, sometimes a huge subset of raw data needs to be retrieved and processed. To expedite this process, we utilize our TSA-tree. Each node of the TSA-tree contains pre-computed trends and surprises at different levels. A wavelet transform is used recursively to construct TSA nodes. As a result, each node of TSA tree is readily available for visualization of trends and surprises. In addition, the size of each node is significantly smaller than that of the original time series, resulting in faster I/O operations. However a limitation of TSA-tree is that its size is larger than the original time series. To address this shortcoming, first we prove that the storage space required to store the optimal subtree of TSA-tree (OTSA-tree) is no more than that required to store the original time series without losing any information. Next, we propose two alternative techniques to reduce the size of the OTSA-tree even further while maintaining an acceptable query precision as compared to querying the original time sequences. Utilizing real and synthetic time sequence databases, we compare our techniques with some well known algorithms.

[1]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[2]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[3]  C. Chui Wavelets: A Tutorial in Theory and Applications , 1992 .

[4]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[5]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[6]  Wim Sweldens,et al.  The lifting scheme: a construction of second generation wavelets , 1998 .

[7]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[8]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[9]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[11]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[12]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[13]  Philip S. Yu,et al.  MALM: a framework for mining sequence database at multiple abstraction levels , 1998, CIKM '98.

[14]  A. Haar Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[15]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[16]  Chris Chatfield,et al.  Introduction to Statistical Time Series. , 1976 .

[17]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[18]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[19]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[20]  Wim Sweldens,et al.  Lifting scheme: a new philosophy in biorthogonal wavelet constructions , 1995, Optics + Photonics.

[21]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[22]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[23]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .