2D TSA-tree: a wavelet-based approach to improve the efficiency of multi-level spatial data mining

Due to the large amount of the collected scientific data, it is becoming increasingly difficult for scientists to comprehend and interpret the available data. Moreover typical queries on these data sets are in the nature of identifying (or visualizing) trends and surprises at a selected sub-region in multiple levels of abstraction rather than identifying information about a specific data point. The authors propose a versatile wavelet-based data structure, 2D TSA-tree (Trend and Surprise Abstractions Tree), to enable efficient multi-level trend detection on spatial data at different levels. We show how 2D TSA-tree can be utilized efficiently for sub-region selections. Moreover, 2D TSA-tree can be utilized to precompute the reconstruction error and retrieval time of a data subset in advance in order to allow the user to trade off accuracy for response time (or vice versa) at query time. Finally, when the storage space is limited, our 2D Optimal TSA-tree saves on storage by storing only a specific optimal subset of the tree. To demonstrate the effectiveness of our proposed methods, we evaluated our 2D TSA-tree using real and synthetic data. Our results show that our method outperformed other methods (DFT and SVD) in terms of accuracy, complexity and scalability.

[1]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[2]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[3]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[4]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[5]  Ambuj K. Singh,et al.  Efficient retrieval for browsing large image databases , 1996, CIKM '96.

[6]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[7]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[9]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[10]  C. Chui Wavelets: A Tutorial in Theory and Applications , 1992 .

[11]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[12]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[13]  Hans-Peter Kriegel,et al.  Algorithms for Characterization and Trend Detection in Spatial Databases , 1998, KDD.

[14]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[15]  Philip S. Yu,et al.  MALM: a framework for mining sequence database at multiple abstraction levels , 1998, CIKM '98.