Fast global k-means clustering based on local geometrical information

The fast global k-means (FGKM) clustering algorithm is one of the most effective approaches for resolving the local convergence of the k-means clustering algorithm. Numerical experiments show that it can effectively determine a global or near global minimizer of the cost function. However, the FGKM algorithm needs a large amount of computational time or storage space when handling large data sets. To overcome this deficiency, a more efficient FGKM algorithm, namely FGKM+A, is developed in this paper. In the development, we first apply local geometrical information to describe approximately the set of objects represented by a candidate cluster center. On the basis of the approximate description, we then propose an acceleration mechanism for the production of new cluster centers. As a result of the acceleration, the FGKM+A algorithm not only yields the same clustering results as that of the FGKM algorithm but also requires less computational time and fewer distance calculations than the FGKM algorithm and its existing modifications. The efficiency of the FGKM+A algorithm is further confirmed by experimental studies on several UCI data sets.

[1]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[2]  Adil M. Bagirov,et al.  Fast modified global k-means algorithm for incremental cluster construction , 2011, Pattern Recognit..

[3]  Jim Z. C. Lai,et al.  Fast global k-means clustering using cluster membership and inequality , 2010, Pattern Recognit..

[4]  Mohammad Reza Meybodi,et al.  Efficient stochastic algorithms for document clustering , 2013, Inf. Sci..

[5]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[6]  Michael K. Ng,et al.  Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[7]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[8]  M. Narasimha Murty,et al.  A near-optimal initial seed value selection in K-means means algorithm using a genetic algorithm , 1993, Pattern Recognit. Lett..

[9]  Jiye Liang,et al.  A new measure of uncertainty based on knowledge granulation for rough sets , 2009, Inf. Sci..

[10]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Abdolreza Hatamlou,et al.  Black hole: A new heuristic optimization approach for data clustering , 2013, Inf. Sci..

[12]  Michael J. Laszlo,et al.  A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yangyang Li,et al.  Gene transposon based clone selection algorithm for automatic clustering , 2012, Inf. Sci..

[14]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[15]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[16]  Lei Zhang,et al.  A novel ant-based clustering algorithm using the kernel method , 2011, Inf. Sci..

[17]  Z. H. Che,et al.  Clustering and selecting suppliers based on simulated annealing algorithms , 2012, Comput. Math. Appl..

[18]  Srinivasan Parthasarathy,et al.  Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.

[19]  Dervis Karaboga,et al.  A modified Artificial Bee Colony algorithm for real-parameter optimization , 2012, Inf. Sci..

[20]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[21]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[22]  Zülal Güngör,et al.  K-harmonic means data clustering with simulated annealing heuristic , 2007, Appl. Math. Comput..

[23]  Yiyu Yao,et al.  MGRS: A multi-granulation rough set , 2010, Inf. Sci..

[24]  Adil M. Bagirov,et al.  Modified global k-means algorithm for minimum sum-of-squares clustering problems , 2008, Pattern Recognit..

[25]  R. J. Kuo,et al.  Integration of particle swarm optimization and genetic algorithm for dynamic clustering , 2012, Inf. Sci..

[26]  Hui Xiong,et al.  K-means clustering versus validation measures: a data distribution perspective , 2006, KDD '06.

[27]  Fakhri Karray,et al.  Model order selection for multiple cooperative swarms clustering using stability analysis , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).