GK-means: an Efficient K-means Clustering Algorithm Based on Grid

As an important tool, clustering analysis is used in many applications such as pattern recognition, data mining, machine learning and statistics etc. K-means clustering, based on minimizing a formal objective function, is perhaps the most widely used and studied. But k the number of clusters needs users specify and the effective initial centers are difficult to select. Meanwhile, it is sensitive to noise data points. In this paper, we focus on choice the better initial centers to improve the quality of k-means and to reduce the computational complexity of k-means method. The proposed algorithm called GK-means, which combines grid structure and spatial index with k-means algorithm. Theoretical analysis and experimental results show the algorithm has high quality and efficiency. Keywords-data mining;clustering analysis;k-means algorithm;grid technology