An Expansion of X-Means for Automatically Determining the Optimal Number of Clusters a^EUR" Progressive Iterations of K-Means and Merging of the Clusters

We expand a non-hierarchical clustering algorithm that can determine the optimal number of clusters by using iterations of -means and a stopping rule based on Bayesian Information Criterion (BIC). The procedure requires merging the clusters that a -means iteration has made to avoid unsuitable division caused by the division order. By using this additional merging operation, the case of adequate clustering was increased for various types of simulation runs. With no prior information about the number of clusters, our method can get the optimal clustering based on information theory instead of on a heuristic method. The computational complexity of our method is for the sample size and the number of final clusters, .

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Jean-Michel Jolion,et al.  Robust Clustering with Applications in Computer Vision , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[4]  Narendra Ahuja,et al.  A data partition method for parallel self-organizing map , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[5]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[6]  Olfa Nasraoui,et al.  Unsupervised Niche Clustering: Discovering an Unknown Number of Clusters in Noisy Data Sets , 2005 .

[7]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[9]  Raghu Krishnapuram,et al.  Fitting an unknown number of lines and planes to image data through compatible cluster merging , 1992, Pattern Recognit..

[10]  A. Hardy On the number of clusters , 1996 .

[11]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[12]  Tsunenori Ishioka,et al.  Extended K-means with an Efficient Estimation of the Number of Clusters , 2000, Ideal.

[13]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.