Modified global k-means algorithm for minimum sum-of-squares clustering problems

k-Means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and inefficient for solving clustering problems in large data sets. Recently, a new version of the k-means algorithm, the global k-means algorithm has been developed. It is an incremental algorithm that dynamically adds one cluster center at a time and uses each data point as a candidate for the k-th cluster center. Results of numerical experiments show that the global k-means algorithm considerably outperforms the k-means algorithms. In this paper, a new version of the global k-means algorithm is proposed. A starting point for the k-th cluster center in this algorithm is computed by minimizing an auxiliary cluster function. Results of numerical experiments on 14 data sets demonstrate the superiority of the new algorithm, however, it requires more computational time than the global k-means algorithm.

[1]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[2]  Pierre Hansen,et al.  Analysis of Global k-Means, an Incremental Heuristic for Minimum Sum-of-Squares Clustering , 2005, J. Classif..

[3]  Pierre Hansen,et al.  Variable Neighborhood Decomposition Search , 1998, J. Heuristics.

[4]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[5]  Hans-Hermann Bock,et al.  Clustering and Neural Networks , 1998 .

[6]  G. Diehr Evaluation of a Branch and Bound Algorithm for Clustering , 1985 .

[7]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[8]  H. Kiers Advances in data science and classification , 1998 .

[9]  A. Bagirov,et al.  A Global Optimization Approach to Classification , 2002 .

[10]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[13]  A. Rubinov,et al.  Unsupervised and supervised data classification via nonsmooth and global optimization , 2003 .

[14]  Donald E. Brown,et al.  A practical application of simulated annealing to clustering , 1990, Pattern Recognit..

[15]  Pierre Hansen,et al.  J-MEANS: a new local search heuristic for minimum sum of squares clustering , 1999, Pattern Recognit..

[16]  Anil K. Jain,et al.  Clustering techniques: The user's dilemma , 1976, Pattern Recognit..

[17]  Adil M. Bagirov,et al.  A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems , 2006, Eur. J. Oper. Res..

[18]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[19]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[20]  Dominique Peeters,et al.  A comparison of two dual-based procedures for solving the p-median problem , 1985 .

[21]  Andrew Stranieri,et al.  A global optimisation approach to classification in medical diagnosis and prognosis , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[22]  Ru-Qin Yu,et al.  Cluster Analysis by Simulated Annealing , 1994, Comput. Chem..

[23]  Shokri Z. Selim,et al.  A simulated annealing algorithm for the clustering problem , 1991, Pattern Recognit..

[24]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .