Spectral redemption in clustering sparse networks

Significance Spectral algorithms are widely applied to data clustering problems, including finding communities or partitions in graphs and networks. We propose a way of encoding sparse data using a “nonbacktracking” matrix, and show that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model. This is in contrast with classical spectral algorithms, based on the adjacency matrix, random walk matrix, and graph Laplacian, which perform poorly in the sparse case, failing significantly above a recently discovered phase transition for the detectability of communities. Further support for the method is provided by experiments on real networks as well as by theoretical arguments and analogies from probability theory, statistical physics, and the theory of random matrices. Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

[1]  Michael J. Freedman,et al.  Scalable Inference of Overlapping Communities , 2012, NIPS.

[2]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[3]  Florent Krzakala,et al.  Comparative study for inference of hidden classes in stochastic block models , 2012, ArXiv.

[4]  Peter J. Bickel,et al.  Fitting community models to large sparse networks , 2012, ArXiv.

[5]  Raj Rao Nadakuditi,et al.  Graph spectra and the detectability of community structure in networks , 2012, Physical review letters.

[6]  Elchanan Mossel,et al.  Stochastic Block Models and Reconstruction , 2012 .

[7]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  F. Krzakala,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[10]  Edwin R. Hancock,et al.  Graph Characterization via Ihara Coefficients , 2011, IEEE Transactions on Neural Networks.

[11]  Pascal O. Vontobel,et al.  Connecting the Bethe entropy and the edge zeta function of a cycle code , 2010, 2010 IEEE International Symposium on Information Theory.

[12]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[13]  Kenji Fukumizu,et al.  Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation , 2009, NIPS.

[14]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[15]  T. Richardson,et al.  Modern Coding Theory , 2008 .

[16]  Elchanan Mossel,et al.  A Spectral Approach to Analysing Belief Propagation for 3-Colouring , 2007, Combinatorics, Probability and Computing.

[17]  J. Friedman,et al.  THE NON-BACKTRACKING SPECTRUM OF THE UNIVERSAL COVER OF A GRAPH , 2007, 0712.0192.

[18]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[19]  BollobásBéla,et al.  The phase transition in inhomogeneous random graphs , 2007 .

[20]  S. Sodin Random matrices, nonbacktracking walks, and orthogonal polynomials , 2007, math-ph/0703043.

[21]  N. Alon,et al.  Non-backtracking random walks mix faster , 2006, math/0610550.

[22]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[24]  B. Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2005, Random Struct. Algorithms.

[25]  Joel Friedman,et al.  A proof of Alon's second eigenvalue conjecture and related problems , 2004, ArXiv.

[26]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[27]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[29]  Elchanan Mossel,et al.  Information flow on trees , 2001, The Annals of Applied Probability.

[30]  B. Sudakov,et al.  The Largest Eigenvalue of Sparse Random Graphs , 2001, Combinatorics, Probability and Computing.

[31]  H. Bass THE IHARA-SELBERG ZETA FUNCTION OF A TREE LATTICE , 1992 .

[32]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[33]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[34]  B. McKay The expected eigenvalue distribution of a large regular graph , 1981 .

[35]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[36]  H. Kesten,et al.  Additional Limit Theorems for Indecomposable Multidimensional Galton-Watson Processes , 1966 .

[37]  E. Wigner On the Distribution of the Roots of Certain Symmetric Matrices , 1958 .

[38]  Ana Pop Eigenvalues of Non-Backtracking Walks in a Cycle with Random Loops , 2007 .

[39]  K. Hashimoto Zeta functions of finite graphs and representations of p-adic groups , 1989 .