A Sparse Completely Positive Relaxation of the Modularity Maximization for Community Detection

In this paper, we consider the community detection problem under either the stochastic block model (SBM) assumption or the degree-correlated stochastic block model (DCSBM) assumption. The modularity maximization formulation for the community detection problem is NP-hard in general. In this paper, we propose a sparse and low-rank completely positive relaxation for the modularity maximization problem, we then develop an efficient row-by-row (RBR) type block coordinate descent (BCD) algorithm to solve the relaxation and prove an $\mathcal{O}(1/\sqrt{N})$ convergence rate to a stationary point where $N$ is the number of iterations. A fast rounding scheme is constructed to retrieve the community structure from the solution. Non-asymptotic high probability bounds on the misclassification rate are established to justify our approach. We further develop an asynchronous parallel RBR algorithm to speed up the convergence. Extensive numerical experiments on both synthetic and real world networks show that the proposed approach enjoys advantages in both clustering accuracy and numerical efficiency. Our numerical results indicate that the newly proposed method is a quite competitive alternative for community detection on sparse networks with over 50 million nodes.

[1]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[2]  Erkki Oja,et al.  Clustering by Nonnegative Matrix Factorization Using Graph Random Walk , 2012, NIPS.

[3]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[4]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[5]  Meng Wang,et al.  Community Detection in Social Networks: An In-depth Benchmarking Study with a Procedure-Oriented Framework , 2015, Proc. VLDB Endow..

[6]  Amin Coja-Oghlan,et al.  Finding Planted Partitions in Random Graphs with General Degree Distributions , 2009, SIAM J. Discret. Math..

[7]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[9]  Anirban Dasgupta,et al.  Spectral analysis of random graphs with skewed degree distributions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[10]  Yudong Chen,et al.  Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting , 2014, ICML.

[11]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[13]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[14]  Dimitris S. Papailiopoulos,et al.  Cyclades: Conflict-free Asynchronous Machine Learning , 2016, NIPS.

[15]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[16]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[17]  Carey E. Priebe,et al.  A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs , 2011, 1108.2228.

[18]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[19]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[20]  Babak Hassibi,et al.  Finding Dense Clusters via "Low Rank + Sparse" Decomposition , 2011, ArXiv.

[21]  Fei Wang,et al.  Community discovery using nonnegative matrix factorization , 2011, Data Mining and Knowledge Discovery.

[22]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[23]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[24]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[25]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[26]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[27]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[28]  Laurent Massoulié,et al.  A spectral method for community detection in moderately sparse degree-corrected stochastic block models , 2015, Advances in Applied Probability.

[29]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[30]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Xiaodong Li,et al.  Convexified Modularity Maximization for Degree-corrected Stochastic Block Models , 2015, The Annals of Statistics.

[32]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[33]  Yuan Zhang,et al.  Detecting Overlapping Communities in Networks Using Spectral Methods , 2014, SIAM J. Math. Data Sci..

[34]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[35]  Yuguo Chen,et al.  Orthogonal symmetric non-negative matrix factorization under the stochastic block model , 2016, 1605.05349.