论文信息 - GK-means: an Efficient K-means Clustering Algorithm Based on Grid

GK-means: an Efficient K-means Clustering Algorithm Based on Grid

As an important tool, clustering analysis is used in many applications such as pattern recognition, data mining, machine learning and statistics etc. K-means clustering, based on minimizing a formal objective function, is perhaps the most widely used and studied. But k the number of clusters needs users specify and the effective initial centers are difficult to select. Meanwhile, it is sensitive to noise data points. In this paper, we focus on choice the better initial centers to improve the quality of k-means and to reduce the computational complexity of k-means method. The proposed algorithm called GK-means, which combines grid structure and spatial index with k-means algorithm. Theoretical analysis and experimental results show the algorithm has high quality and efficiency. Keywords-data mining;clustering analysis;k-means algorithm;grid technology

Xiaoyun Chen | Yi Chen | Guohua Liu | Youli Su

[1] Charles Elkan,et al. Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[2] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[3] Jim Z. C. Lai,et al. A Fuzzy K-means Clustering Algorithm Using Cluster Center Displacement , 2009, J. Inf. Sci. Eng..

[4] Jing-Yu Yang,et al. Hierarchical initialization approach for K-Means clustering , 2008, Pattern Recognit. Lett..

[5] Paul S. Bradley,et al. Refining Initial Points for K-Means Clustering , 1998, ICML.

[6] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[7] Shehroz S. Khan,et al. Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[8] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .