Probabilistic K-Means using Method of Moments

K-means is one of the most widely used algorithms for clustering in Data Mining applications, which attempts to minimize the sum of Euclidean distance of the points in the clusters from the respective means of the clusters. The simplicity and scalability of K-means makes it very appealing. However, K-means suffers from local minima problem, and comes with no guarantee to converge to the optimal cost. K-means++ tries to address the problem by seeding the means using a distance based sampling scheme. However, seeding the means in K-means++ needs O(K) passes through the entire dataset, which could be very costly in large amount of dataset. Here we propose a method of seeding initial means based on higher order moments of the data, which takes O(1) passes through the entire dataset to extract the initial set of means. Our method yields competitive performance with respect to all the existing K-means algorithms, whilst avoiding the expensive mean selection steps of K-means++ and other heuristics. We demonstrate the performance of our algorithm in comparison with the existing algorithms on various benchmark datasets.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Davide Anguita,et al.  Transition-Aware Human Activity Recognition Using Smartphones , 2016, Neurocomputing.

[3]  Michael Collins,et al.  Spectral Dependency Parsing with Latent Variables , 2012, EMNLP-CoNLL.

[4]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[5]  Nesime Tatbul,et al.  Proceedings of the VLDB Endowment , 2011 .

[6]  Tamara G. Kolda,et al.  Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[7]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[8]  Adam Meyerson,et al.  Fast and Accurate k-means For Large Datasets , 2011, NIPS.

[9]  Karl Stratos,et al.  Spectral learning of latent-variable PCFGs: algorithms and sample complexity , 2014, J. Mach. Learn. Res..

[10]  M. Panella Associate Editor of the Journal of Computer and System Sciences , 2014 .

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[13]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[14]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[15]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[16]  Nir Ailon,et al.  Streaming k-means approximation , 2009, NIPS.

[17]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[18]  Percy Liang,et al.  Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[19]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[20]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[23]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[24]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.