GCA: A real-time grid-based clustering algorithm for large data set

Few of the current existing methods for unsupervised learning (clustering) algorithms consider clustering the data points in a low-dimensional subspace in real time. In this paper, we present a grid based clustering algorithm (GCA) with time complexity (O(n)). Unlike previous clustering algorithm, GCA pays more attention to the running time of the algorithm. GCA achieves low running time by (i) determining the number of the clusters according to the point density of the grid cell and (ii) computing the distances between the centers of the clusters and the grid cells, not the data points. In order to make GCA more efficient, principal component analysis (PCA) is introduced to transform the data points from high dimension to low dimension. Finally, we analyze the performance of GCA and show that it outperforms most of the current state-of-the-art methods in terms of efficiency. In particular, it outperforms k-means algorithm by several orders in the running time

[1]  Dimitrios Charalampidis,et al.  A modified k-means algorithm for circular invariant clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[3]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[4]  Sankar K. Pal,et al.  Multispectral image segmentation using the rough-set-initialized EM algorithm , 2002, IEEE Trans. Geosci. Remote. Sens..

[5]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Francesco Camastra,et al.  A Novel Kernel Method for Clustering , 2005, IEEE Trans. Pattern Anal. Mach. Intell..