K-harmonic means data clustering with simulated annealing heuristic

Abstract Clustering procedures partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some predefined criteria. Clustering is a popular data analysis and data mining technique. Since clustering problem have NP-complete nature, the larger the size of the problem, the harder to find the optimal solution and furthermore, the longer to reach a reasonable results. One of the most used techniques for clustering is based on K-means such that the data is partitioned into K clusters. Although k-means algorithm is easy to implement and works fast in most situations, it suffers from two major drawbacks. One is sensitivity to initialization and the other is convergence to local optima. It is seen from the studies K harmonic means clustering solves the problem of initialization but since its greedy search nature, the second problem; convergence to local optima, still remains. In this paper we develop a new algorithm for solving this problem based on a simulated annealing technique – simulated annealing K-harmonic means clustering (SAKHMC). The experiment results on the Iris and the other well known data, illustrate the robustness of the SAKHMC clustering algorithm.

[1]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[2]  M. Delgado,et al.  A tabu search approach to the fuzzy clustering problem , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[3]  Michael Randolph Garey,et al.  The complexity of the generalized Lloyd - Max problem , 1982, IEEE Trans. Inf. Theory.

[4]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[5]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Jonathan M. Garibaldi,et al.  Simulated Annealing Fuzzy Clustering in Cancer Diagnosis , 2005, Informatica.

[8]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[9]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[10]  Francisco Herrera,et al.  A greedy randomized adaptive search procedure applied to the clustering problem as an initialization process using K-Means as a local search procedure , 2002, J. Intell. Fuzzy Syst..

[11]  Jeng-Shyang Pan,et al.  Vector quantization based on genetic simulated annealing , 2001, Signal Process..

[12]  Martin Pincus,et al.  Letter to the Editor - A Monte Carlo Method for the Approximate Solution of Certain Types of Constrained Optimization Problems , 1970, Oper. Res..

[13]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[14]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[15]  Khaled S. Al-Sultan,et al.  A tabu search-based algorithm for the fuzzy clustering problem , 1997, Pattern Recognit..

[16]  Samir Saoudi,et al.  Stochastic K-means algorithm for vector quantization , 2001, Pattern Recognit. Lett..

[17]  John F. Roddick,et al.  A clustering algorithm using the tabu search approach with simulated annealing for vector quantization , 2003 .

[18]  Parag M. Kanade,et al.  Fuzzy ants as a clustering concept , 2003, 22nd International Conference of the North American Fuzzy Information Processing Society, NAFIPS 2003.

[19]  Greg Hamerly,et al.  Alternatives to the k-means algorithm that find better clusterings , 2002, CIKM '02.

[20]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[21]  Pierre Hansen,et al.  J-MEANS: a new local search heuristic for minimum sum of squares clustering , 1999, Pattern Recognit..

[22]  Chang Sup Sung,et al.  A tabu-search-based heuristic for clustering , 2000, Pattern Recognit..

[23]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[24]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[25]  J. Casillas Interpretability issues in fuzzy modeling , 2003 .

[26]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[27]  Pasi Fränti,et al.  Randomised Local Search Algorithm for the Clustering Problem , 2000, Pattern Analysis & Applications.

[28]  A. Kai Qin,et al.  Initialization insensitive LVQ algorithm based on cost-function adaptation , 2005, Pattern Recognit..

[29]  Giuseppe Patanè,et al.  The enhanced LBG algorithm , 2001, Neural Networks.

[30]  Olli Nevalainen,et al.  Tabu search algorithm for codebook generation in vector quantization , 1998, Pattern Recognit..