Sparse probabilistic K-means

Abstract The goal of clustering is to partition a set of data points into groups of similar data points, called clusters. Clustering algorithms can be classified into two categories: hard and soft clustering. Hard clustering assigns each data point to one cluster exclusively. On the other hand, soft clustering allows probabilistic assignments to clusters. In this paper, we propose a new model which combines the benefits of these two models: clarity of hard clustering and probabilistic assignments of soft clustering. Since the majority of data usually have a clear association, only a few points may require a probabilistic interpretation. Thus, we apply the l1 norm constraint to impose sparsity on probabilistic assignments. Moreover, we also incorporate outlier detection in our clustering model to simultaneously detect outliers which can cause serious problems in statistical analyses. To optimize the model, we introduce an alternating minimization method and prove its convergence. Numerical experiments and comparisons with existing models show the soundness and effectiveness of the proposed model.

[1]  Joyce Jiyoung Whang,et al.  Non-Exhaustive, Overlapping Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Inderjit S. Dhillon,et al.  Non-exhaustive, Overlapping k-means , 2015, SDM.

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[5]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[6]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[9]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[10]  Miguel Á. Carreira-Perpiñán,et al.  Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application , 2013, ArXiv.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Michael Kirby,et al.  Stratifying High-Dimensional Data Based on Proximity to the Convex Hull Boundary , 2016, SIAM Rev..

[13]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[14]  Inderjit S. Dhillon,et al.  Non-exhaustive, Overlapping Clustering via Low-Rank Semidefinite Programming , 2015, KDD.

[15]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[16]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[17]  S. Yun,et al.  Non-convex TV denoising corrupted by impulse noise , 2017 .

[18]  Yoram Singer,et al.  Efficient Learning of Label Ranking by Soft Projections onto Polyhedra , 2006, J. Mach. Learn. Res..