A Wavelet-Based Approach to Improve the E ciency of Multi-Level Surprise Mining?

Due to the large amount of the collected scienti c data, it is becoming increasingly di cult for scientists to comprehend and interpret the available data. Moreover, typical queries on these data sets are in the nature of identifying (or visualizing) trends and surprises at a selected sub-region in multiple levels of abstraction rather than identifying information about a speci c data point. In this paper, we show how a wavelet-based data structure, 2D TSA-tree (stands for Trend and Surprise Abstractions Tree) can be utilized e ciently to detect surprises on spatio-temporal data at di erent levels. Furthermore, we show how to nd surprises within a speci ed period of time at di erent levels of abstraction (e.g., weekly, or monthly) by constructing 1D TSA-tree. To demonstrate the e ectiveness of our proposed methods, we evaluated our 2D TSA-tree using real and synthetic data. The results indicate that 2D TSA-tree approach can be used to visualize di erent kinds of surprises e ectively.

[1]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[3]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[4]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[5]  Cyrus Shahabi,et al.  2D TSA-tree: a wavelet-based approach to improve the efficiency of multi-level spatial data mining , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[6]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.