WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

Many applications require the management of spatial data. Clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shape. It must be insensitive to the outliers (noise) and the order of input data. We propose WaveCluster, a novel clustering approach based on wavelet transforms, which satisfies all the above requirements. Using multiresolution property of wavelet transforms, we can effectively identify arbitrary shape clusters at different degrees of accuracy. We also demonstrate that WaveCluster is highly efficient in terms of time complexity. Experimental results on very large data sets are presented which show the efficiency and effectiveness of the proposed approach compared to the other recent clustering methods. This research is supported by Xerox Corporation. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 24th VLDB Conference New York, USA, 1998

[1]  R. Ng,et al.  Eecient and Eeective Clustering Methods for Spatial Data Mining , 1994 .

[2]  Shih-Fu Chang,et al.  Transform features for texture classification and discrimination in large image databases , 1994, Proceedings of 1st International Conference on Image Processing.

[3]  Sartaj Sahni,et al.  Finding Connected Components and Connected Ones on a Mesh-Connected Parallel Computer , 1980, SIAM J. Comput..

[4]  Aidong Zhang,et al.  Approach to clustering large visual databases using wavelet transform , 1997, Electronic Imaging.

[5]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[8]  Berthold K. P. Horn Robot vision , 1986, MIT electrical engineering and computer science series.

[9]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[10]  S. Mallat Multiresolution approximations and wavelet orthonormal bases of L^2(R) , 1989 .

[11]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[13]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[14]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[15]  C. Fraley,et al.  Nonparametric Maximum Likelihood Estimation of Features in Spatial Point Processes Using Voronoï Tessellation , 1997 .

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[18]  Aidong Zhang,et al.  Geographical image classification and retrieval , 1997, GIS '97.

[19]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .