Improved spectral community detection in large heterogeneous networks

In this article, we propose and study the performance of spectral community detection for a family of "α-normalized" adjacency matrices A, of the type D −α AD −α with D the degree matrix, in heterogeneous dense graph models. We show that the previously used normaliza-tion methods based on A or D −1 AD −1 are in general suboptimal in terms of correct recovery rates and, relying on advanced random matrix methods, we prove instead the existence of an optimal value α opt of the parameter α in our generic model; we further provide an online estimation of α opt only based on the node degrees in the graph. Numerical simulations show that the proposed method outperforms state-of-the-art spectral approaches on moderately dense to dense heterogeneous graphs.

[1]  Remco van der Hofstad,et al.  Random Graphs and Complex Networks , 2016, Cambridge Series in Statistical and Probabilistic Mathematics.

[2]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[4]  J. W. Silverstein,et al.  No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices , 1998 .

[5]  C. Priebe,et al.  Perfect Clustering for Stochastic Blockmodel Graphs via Adjacency Spectral Embedding , 2013, 1310.0532.

[6]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Chao Gao,et al.  Community Detection in Degree-Corrected Block Models , 2016, The Annals of Statistics.

[8]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[9]  Walid Hachem,et al.  The outliers among the singular values of large rectangular random matrices with additive fixed rank deformation , 2012, 1207.0471.

[10]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[11]  W. Hachem,et al.  Deterministic equivalents for certain functionals of large random matrices , 2005, math/0507172.

[12]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Xiaodong Li,et al.  Convexified Modularity Maximization for Degree-corrected Stochastic Block Models , 2015, The Annals of Statistics.

[14]  M. Hastings Community detection as an inference problem. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Michael Collins,et al.  EM Algorithm , 2010, Encyclopedia of Machine Learning.

[17]  R. Couillet,et al.  Kernel spectral clustering of large dimensional data , 2015, 1510.03547.

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[19]  Raj Rao Nadakuditi,et al.  Graph spectra and the detectability of community structure in networks , 2012, Physical review letters.

[20]  Florent Krzakala,et al.  Spectral Clustering of graphs with the Bethe Hessian , 2014, NIPS.

[21]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[22]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[23]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  J. W. Silverstein,et al.  On the empirical distribution of eigenvalues of a class of large dimensional random matrices , 1995 .

[25]  Amin Coja-Oghlan,et al.  Finding Planted Partitions in Random Graphs with General Degree Distributions , 2009, SIAM J. Discret. Math..

[26]  L. Pastur,et al.  Eigenvalue Distribution of Large Random Matrices , 2011 .

[27]  Mark E. J. Newman,et al.  Community detection in networks: Modularity optimization and maximum likelihood are equivalent , 2016, ArXiv.

[28]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[29]  Mark E. J. Newman,et al.  Spectral community detection in sparse networks , 2013, ArXiv.

[30]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[31]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[32]  Philippe Loubaton,et al.  A subspace estimator for fixed rank perturbations of large random matrices , 2011, J. Multivar. Anal..

[33]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[34]  Oskari Ajanki,et al.  Quadratic Vector Equations On Complex Upper Half-Plane , 2015, Memoirs of the American Mathematical Society.

[35]  Laurent Massoulié,et al.  A spectral method for community detection in moderately sparse degree-corrected stochastic block models , 2015, Advances in Applied Probability.

[36]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Romain Couillet,et al.  Performance analysis of spectral community detection in realistic graph models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[39]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[40]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.