Spectral Clustering of graphs with the Bethe Hessian

Spectral clustering is a standard approach to label nodes on a graph by studying the (largest or lowest) eigenvalues of a symmetric real matrix such as e.g. the adjacency or the Laplacian. Recently, it has been argued that using instead a more complicated, non-symmetric and higher dimensional operator, related to the non-backtracking walk on the graph, leads to improved performance in detecting clusters, and even to optimal performance for the stochastic block model. Here, we propose to use instead a simpler object, a symmetric real matrix known as the Bethe Hessian operator, or deformed Laplacian. We show that this approach combines the performances of the non-backtracking operator, thus detecting clusters all the way down to the theoretical limit in the stochastic block model, with the computational, theoretical and memory advantages of real symmetric matrices.

[1]  Kenji Fukumizu,et al.  Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation , 2009, NIPS.

[2]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[3]  S. Stenholm Information, Physics and Computation, by Marc Mézard and Andrea Montanari , 2010 .

[4]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[5]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Axel Ruhe ALGORITHMS FOR THE NONLINEAR EIGENVALUE PROBLEM , 1973 .

[8]  Marc Lelarge,et al.  Resolvent of large random graphs , 2010 .

[9]  Florent Krzakala,et al.  Spectral density of the non-backtracking operator on random graphs , 2014, ArXiv.

[10]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[11]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[12]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[13]  Koujin Takeda,et al.  Cavity approach to the spectral density of sparse symmetric random matrices. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[15]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[16]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[17]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[19]  Hilbert J. Kappen,et al.  Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks , 2004, NIPS.

[20]  Cristopher Moore,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[21]  F. Ricci-Tersenghi The Bethe approximation for solving the inverse Ising problem: a comparison with other inference methods , 2011, 1112.4814.

[22]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .