Semidefinite Programs for Exact Recovery of a Hidden Community

We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality $K$ from an $n \times n$ symmetric data matrix $A$, where for distinct indices $i,j$, $A_{ij} \sim P$ if $i, j$ are both in the community and $A_{ij} \sim Q$ otherwise, for two known probability distributions $P$ and $Q$. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case ($P={{\rm Bern}}(p)$ and $Q={{\rm Bern}}(q)$ with $p>q$) and the Gaussian case ($P=\mathcal{N}(\mu,1)$ and $Q=\mathcal{N}(0,1)$ with $\mu>0$), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If $K=\omega( n /\log n)$, SDP attains the information-theoretic recovery limits with sharp constants; (2) If $K=\Theta(n/\log n)$, SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If $K=o(n/\log n)$ and $K \to \infty$, SDP is order-wise suboptimal. The same critical scaling for $K$ is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of $n$ vertices partitioned into multiple communities of equal size $K$. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

[1]  Prasad Raghavendra,et al.  Tight Lower Bounds for Planted Clique in the Degree-4 SOS Program , 2015, ArXiv.

[2]  Pravesh Kothari,et al.  SoS and Planted Clique: Tight Analysis of MPW Moments at all Degrees and an Optimal Lower Bound at Degree Four , 2015, ArXiv.

[3]  Tengyuan Liang,et al.  Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix , 2015, 1502.01988.

[4]  Yu. I. Ingster,et al.  Sharp Variable Selection of a Sparse Submatrix in a High-Dimensional Noisy Matrix , 2013, 1303.5647.

[5]  Avi Wigderson,et al.  Sum-of-squares Lower Bounds for Planted Clique , 2015, STOC.

[6]  Bruce Hajek,et al.  Information limits for recovering a hidden community , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).

[7]  R. Lata,et al.  SOME ESTIMATES OF NORMS OF RANDOM MATRICES , 2004 .

[8]  Bruce E. Hajek,et al.  Recovering a Hidden Community Beyond the Spectral Limit in O(|E|log*|V|) Time , 2015, ArXiv.

[9]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[10]  A. A. Serov,et al.  A Complete Proof of Universal Inequalities for the Distribution Function of the Binomial Law , 2013 .

[11]  Babak Hassibi,et al.  Sharp performance bounds for graph clustering via convex optimization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[13]  Can M. Le,et al.  Concentration and regularization of random graphs , 2015, Random Struct. Algorithms.

[14]  Robert Krauthgamer,et al.  Finding and certifying a large hidden clique in a semirandom graph , 2000, Random Struct. Algorithms.

[15]  Bruce E. Hajek,et al.  Submatrix localization via message passing , 2015, J. Mach. Learn. Res..

[16]  Robert Krauthgamer,et al.  The Probable Value of the Lovász--Schrijver Relaxations for Maximum Independent Set , 2003, SIAM J. Comput..

[17]  Andrea Montanari,et al.  Semidefinite programs on sparse random graphs and their application to community detection , 2015, STOC.

[18]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[19]  A. Bandeira,et al.  Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[20]  Z. D. Bai,et al.  Necessary and Sufficient Conditions for Almost Sure Convergence of the Largest Eigenvalue of a Wigner Matrix , 1988 .

[21]  T. Tao Topics in Random Matrix Theory , 2012 .

[22]  R. Latala Some estimates of norms of random matrices , 2005 .

[23]  Andrea Montanari,et al.  Semidefinite Programs on Sparse Random Graphs , 2015, ArXiv.

[24]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[25]  A. Nobel,et al.  Finding large average submatrices in high dimensional data , 2009, 0905.1682.

[26]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[27]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[28]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[29]  Andrea Montanari,et al.  Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems , 2015, COLT.

[30]  Yihong Wu,et al.  Computational Barriers in Minimax Submatrix Detection , 2013, ArXiv.

[31]  B. Nadler,et al.  DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[32]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[33]  Van H. Vu,et al.  Spectral norm of random matrices , 2005, STOC '05.

[34]  Sivaraman Balakrishnan,et al.  Minimax Localization of Structural Information in Large Noisy Matrices , 2011, NIPS.

[35]  Afonso S. Bandeira,et al.  Random Laplacian Matrices and Convex Relaxations , 2015, Found. Comput. Math..

[36]  Bruce E. Hajek,et al.  Computational Lower Bounds for Community Detection on Random Graphs , 2014, COLT.

[37]  P Erd,et al.  On the application of the borel-cantelli lemma , 1952 .

[38]  Herbert A. David,et al.  Order Statistics , 2011, International Encyclopedia of Statistical Science.

[39]  Alexandra Kolla,et al.  Multisection in the Stochastic Block Model using Semidefinite Programming , 2015, ArXiv.

[40]  Alexander S. Wein,et al.  A semidefinite program for unbalanced multisection in the stochastic block model , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[41]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[42]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).