论文信息 - Research on K-means Algorithm Optimization based on Compression Learning

Research on K-means Algorithm Optimization based on Compression Learning

The K-means algorithm is one of the classical algorithms of clustering. However, as the data set increases, the computational cost of clustering becomes higher. The orthogonal matching pursuit algorithm is a classic signal reconstruction algorithm. The paper improves its algorithm based on compression learning and applies it to the K-means algorithm, which uses the sketch of the original data set to estimate the cluster center. The experiment results show that the clustering effect of this method is similar to that of K-means algorithm, because the size of the sketch is independent of the size of the original data set, only related to the number of centroids K and the dimension n of the data, which reduces the computational complexity of the algorithm. For large data sets, experiments show that the improved algorithm is more optimized than the traditional K-means algorithm.

Cai Shuai | Zhao Xiao | Zhu Lei | Weijun Zeng | Re Yu

[1] Christos Boutsidis,et al. Random Projections for $k$-means Clustering , 2010, NIPS.

[2] Rémi Gribonval,et al. Flexible Multilayer Sparse Approximations of Matrices and Applications , 2015, IEEE Journal of Selected Topics in Signal Processing.

[3] Patrick Pérez,et al. Sketching for Large-Scale Learning of Mixture Models. (Apprentissage de modèles de mélange à large échelle par Sketching) , 2017 .

[4] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[5] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..