Planted clique detection below the noise floor using low-rank sparse PCA

Detection of clusters and communities in graphs is useful in a wide range of applications. In this paper we investigate the problem of detecting a clique embedded in a random graph. Recent results have demonstrated a sharp detectability threshold for a simple algorithm based on principal component analysis (PCA). Sparse PCA of the graph's modularity matrix can successfully discover clique locations where PCA-based detection methods fail. In this paper, we demonstrate that applying sparse PCA to low-rank approximations of the modularity matrix is a viable solution to the planted clique problem that enables detection of small planted cliques in graphs where running the standard semidefinite program for sparse PCA is not possible.

[1]  J. W. Silverstein,et al.  Spectral Analysis of Large Dimensional Random Matrices , 2009 .

[2]  P. Wolfe,et al.  Anomalous subgraph detection via Sparse Principal Component Analysis , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[3]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[4]  Patrick J. Wolfe,et al.  Detection Theory for Graphs , 2013 .

[5]  Raj Rao Nadakuditi,et al.  On hard limits of eigen-analysis based planted clique detection , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[6]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  E. Arias-Castro,et al.  Community Detection in Random Networks , 2013, 1302.7099.

[9]  R. Bellman Calculus of Variations (L. E. Elsgolc) , 1963 .

[10]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[11]  Weixiong Zhang,et al.  An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[12]  Andrea Montanari,et al.  Finding Hidden Cliques of Size \sqrt{N/e} in Nearly Linear Time , 2013, ArXiv.

[13]  Raj Rao Nadakuditi,et al.  The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.

[14]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..