High performance computing for spatial outliers detection using parallel wavelet transform

Wavelet analysis is a practical tool to study signal analysis and image processing. Traditional Fourier transform can also transfer the signal into frequency domain, but wavelet analysis is more attractive for its features of multi-resolution and localization of frequency. Recently, there has been significant development in the use of wavelet methods in the data mining process. However, the objective of the study described in this paper is twofold: designing a wavelet transform algorithm on the multiprocessor architecture and using this algorithm in mining spatial outliers of meteorological data. Spatial outliers are the spatial objects with distinct features from their surrounding neighbors. Outlier detection reveals important and valuable information from large spatial data sets. As region outliers are commonly multi-scale objects, wavelet analysis is an effective tool to study them. In this paper, we present a wavelet based approach and its applicability in outlier detection. We design a suite of algorithms to effectively discover region outliers and also a parallel algorithm is designed to bring efficiency and speedup for the wavelet analysis. The applicability and effectiveness of the developed algorithms are evaluated on real-world meteorological dataset.

[1]  Shashi Shekhar,et al.  A Unified Approach to Detecting Spatial Outliers , 2003, GeoInformatica.

[2]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[3]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[4]  Tomás F. Pena,et al.  Parallel implementation of wavelet transforms on distributed-memory multicomputers , 2001, Proceedings International Conference on Parallel Processing Workshops.

[5]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[6]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[7]  Hans-Peter Kriegel,et al.  OPTICS-OF: Identifying Local Outliers , 1999, PKDD.

[8]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers , 2002, Intell. Data Anal..

[9]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[10]  Dwl Cheung,et al.  Parallel Algorithm for Mining Outliers in Large Database , 1999 .

[11]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[12]  Jiawei Han,et al.  Spatial Data Mining: Progress and Challenges , 1996, Workshop on Research Issues on Data Mining and Knowledge Discovery.

[13]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[14]  Shashi Shekhar,et al.  Spatial Databases - Accomplishments and Research Needs , 1999, IEEE Trans. Knowl. Data Eng..

[15]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[16]  Andrew K. Chan,et al.  Parallel implementation of wavelet decomposition/reconstruction algorithms , 1994, Defense, Security, and Sensing.

[17]  Markus Hegland,et al.  A scalable parallel 2D wavelet transform algorithm , 1997 .

[18]  Chang-Tien Lu,et al.  Detecting region outliers in meteorological data , 2003, GIS '03.