论文信息 - Spherical k-Means++ Clustering

Spherical k-Means++ Clustering

k-means clustering (KM) algorithm, also called hard c-means clustering (HCM) algorithm, is a very powerful clustering algorithm [1, 2], but it has a serious problem of strong initial value dependence. To decrease the dependence, Arthur and Vassilvitskii proposed an algorithm of k-means++ clustering (KM++) algorithm on 2007 [3]. By the way, there are many case that each object is allocated on an unit sphere, e.g. text clustering. Dhillon and Modha proposed the primitive spherical k-means clustering algorithm to classify such objects on 2007 [4] and Honik, Kober, and Buchta proposed new spherical k-means clustering (SKM) algorithm on 2012 [5]. However, both of the algorithms also have the same problem of initial value dependence as KM. Therefore, the paper discuss the following points: (1) the dissimilarity of SKM is extended to satisfy the triangle inequality, and (2) spherical k-means++ clustering (SKM++) algorithm which works well for the problem is proposed. The paper shows that the effectiveness of SKM++ is theoretically guaranteed.

Sadaaki Miyamoto | Yasunori Endo | S. Miyamoto | Y. Endo

[1] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[2] Inderjit S. Dhillon,et al. Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[3] Kurt Hornik,et al. Spherical k-Means Clustering , 2012 .

[4] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.