Improving the Efficiency and Efficacy of the K-means Clustering Algorithm Through a New Convergence Condition

Clustering problems arise in many different applications: machine learning, data mining, knowledge discovery, data compression, vector quantization, pattern recognition and pattern classification. One of the most popular and widely studied clustering methods is K-means. Several improvements to the standard K-means algorithm have been carried out, most of them related to the initial parameter values. In contrast, this article proposes an improvement using a new convergence condition that consists of stopping the execution when a local optimum is found or no more object exchanges among groups can be performed. For assessing the improvement attained, the modified algorithm (Early Stop K-means) was tested on six databases of the UCI repository, and the results were compared against SPSS, Weka and the standard K-means algorithm. Experimentally Early Stop K-means obtained important reductions in the number of iterations and improvements in the solution quality with respect to the other algorithms.

[1]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[2]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[3]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[4]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[5]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[6]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[7]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[8]  Marina Meila,et al.  An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[9]  Greg Hamerly,et al.  Alternatives to the k-means algorithm that find better clusterings , 2002, CIKM '02.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[12]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[13]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..