An accelerated K-means clustering algorithm using selection and erasure rules

The K-means method is a well-known clustering algorithm with an extensive range of applications, such as biological classification, disease analysis, data mining, and image compression. However, the plain K-means method is not fast when the number of clusters or the number of data points becomes large. A modified K-means algorithm was presented by Fahim et al. (2006). The modified algorithm produced clusters whose mean square error was very similar to that of the plain K-means, but the execution time was shorter. In this study, we try to further increase its speed. There are two rules in our method: a selection rule, used to acquire a good candidate as the initial center to be checked, and an erasure rule, used to delete one or many unqualified centers each time a specified condition is satisfied. Our clustering results are identical to those of Fahim et al. (2006). However, our method further cuts computation time when the number of clusters increases. The mathematical reasoning used in our design is included.

[1]  Tzung-Pei Hong,et al.  Mining Outliers in Correlated Subspaces for High Dimensional Data Sets , 2010, Fundam. Informaticae.

[2]  Ran-Zan Wang,et al.  An image-hiding method with high hiding capacity based on best-block matching and k-means clustering , 2007, Pattern Recognit..

[3]  Richard Weber,et al.  A methodology for dynamic data mining based on fuzzy clustering , 2005, Fuzzy Sets Syst..

[4]  Abdel-Badeeh M. Salem,et al.  An efficient enhanced k-means clustering algorithm , 2006 .

[5]  Wan-Jui Lee,et al.  VECTOR QUANTIZATION OF IMAGES USING A FUZZY CLUSTERING METHOD , 2007, Cybern. Syst..

[6]  Shanan Zhu,et al.  Multi-face detection based on downsampling and modified subtractive clustering for color images , 2007 .

[7]  Ja-Chen Lin Multi-Class Clustering by Analytical Two-Class Formulas , 1996, Int. J. Pattern Recognit. Artif. Intell..

[8]  Dorothea Emig,et al.  Partitioning biological data with transitivity clustering , 2010, Nature Methods.

[9]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[10]  Hwei-Jen Lin,et al.  An Efficient GA-based Clustering Technique , 2005 .

[11]  郭继东,et al.  A statistical information-based clustering approach in distance space , 2005 .

[12]  Sergios Theodoridis,et al.  Chapter 13 – Clustering Algorithms II: Hierarchical Algorithms , 2006 .

[13]  S. Horvath,et al.  Global histone modification patterns predict risk of prostate cancer recurrence , 2005, Nature.

[14]  Ja-Chen Lin,et al.  Secret Image Sharing based on Vector Quantization , 2009 .

[15]  Jing-Yu Yang,et al.  Hierarchical initialization approach for K-Means clustering , 2008, Pattern Recognit. Lett..

[16]  Sergios Theodoridis,et al.  Clustering Algorithms II: Hierarchical Algorithms , 2009 .