Partial recovery bounds for clustering with the relaxed K-means

We investigate the clustering performances of the relaxed $K$means in the setting of sub-Gaussian Mixture Model (sGMM) and Stochastic Block Model (SBM). After identifying the appropriate signal-to-noise ratio (SNR), we prove that the misclassification error decay exponentially fast with respect to this SNR. These partial recovery bounds for the relaxed $K$means improve upon results currently known in the sGMM setting. In the SBM setting, applying the relaxed $K$means SDP allows to handle general connection probabilities whereas other SDPs investigated in the literature are restricted to the assortative case (where within group probabilities are larger than between group probabilities). Again, this partial recovery bound complements the state-of-the-art results. All together, these results put forward the versatility of the relaxed $K$means.

[1]  Jiming Peng,et al.  Advanced Optimization Laboratory Title : Approximating K-means-type clustering via semidefinite programming , 2005 .

[2]  Ankur Moitra,et al.  How robust are reconstruction thresholds for community detection? , 2015, STOC.

[3]  Jerry Li,et al.  Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[6]  Yu Lu,et al.  Statistical and Computational Guarantees of Lloyd's Algorithm and its Variants , 2016, ArXiv.

[7]  Elizaveta Levina,et al.  On semidefinite relaxations for the block model , 2014, ArXiv.

[8]  Bruce E. Hajek,et al.  Semidefinite Programs for Exact Recovery of a Hidden Community , 2016, COLT.

[9]  Alexander S. Wein,et al.  A semidefinite program for unbalanced multisection in the stochastic block model , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[10]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[11]  Pravesh Kothari,et al.  Better Agnostic Clustering Via Relaxed Tensor Norms , 2017, ArXiv.

[12]  C. Giraud Introduction to High-Dimensional Statistics , 2014 .

[13]  Daniel M. Kane,et al.  List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[14]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[15]  Yudong Chen,et al.  Hidden Integrality of SDP Relaxation for Sub-Gaussian Mixture Models , 2018, COLT.

[16]  Ravishankar Krishnaswamy,et al.  Relax, No Need to Round: Integrality of Clustering Formulations , 2014, ITCS.

[17]  Xiaodong Li,et al.  Convexified Modularity Maximization for Degree-corrected Stochastic Block Models , 2015, The Annals of Statistics.

[18]  Larry A. Wasserman,et al.  Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation , 2013, NIPS.

[19]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[20]  Ravishankar Krishnaswamy,et al.  The Hardness of Approximation of Euclidean k-Means , 2015, SoCG.

[21]  Adel Javanmard,et al.  Phase transitions in semidefinite relaxations , 2015, Proceedings of the National Academy of Sciences.

[22]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[23]  Andrea Montanari,et al.  Asymptotic mutual information for the binary stochastic block model , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[24]  Florentina Bunea,et al.  PECOK: a convex optimization approach to variable clustering , 2016, 1606.05100.

[25]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[26]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[27]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[28]  Martin Royer,et al.  Adaptive Clustering through Semidefinite Programming , 2017, NIPS.

[29]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[30]  Dustin G. Mixon,et al.  Clustering subgaussian mixtures by semidefinite programming , 2016, ArXiv.

[31]  Mohamed Ndaoud Sharp optimal recovery in the Two Gaussian Mixture Model. , 2018 .

[32]  Florentina Bunea,et al.  Model assisted variable clustering: Minimax-optimal recovery and algorithms , 2015 .

[33]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[34]  Yudong Chen,et al.  Exponential Error Rates of SDP for Block Models: Beyond Grothendieck’s Inequality , 2017, IEEE Transactions on Information Theory.

[35]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[36]  Stéphane Chrétien,et al.  A Semi-Definite Programming approach to low dimensional embedding for unsupervised clustering , 2016, ArXiv.

[37]  M. Ndaoud Sharp optimal recovery in the Two Component Gaussian Mixture Model , 2018, 1812.08078.

[38]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[39]  Xiaodong Li,et al.  Convex Relaxation Methods for Community Detection , 2018, Statistical Science.

[40]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[41]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[42]  Dustin G. Mixon,et al.  On the tightness of an SDP relaxation of k-means , 2015, ArXiv.

[43]  Xiaodong Li,et al.  When do birds of a feather flock together? k-Means, proximity, and conic programming , 2017, Mathematical Programming.

[44]  Piyush Srivastava,et al.  Exact recovery in the Ising blockmodel , 2016, The Annals of Statistics.

[45]  Aravindan Vijayaraghavan,et al.  On Learning Mixtures of Well-Separated Gaussians , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[46]  Yun Yang,et al.  Hanson–Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data , 2018, 1810.11180.

[47]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[48]  Cristopher Moore,et al.  The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness , 2017, Bull. EATCS.

[49]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.