The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness

Community detection in graphs is the problem of finding groups of vertices which are more densely connected than they are to the rest of the graph. This problem has a long history, but it is undergoing a resurgence of interest due to the need to analyze social and biological networks. While there are many ways to formalize it, one of the most popular is as an inference problem, where there is a "ground truth" community structure built into the graph somehow. The task is then to recover the ground truth knowing only the graph. Recently it was discovered, first heuristically in physics and then rigorously in probability and computer science, that this problem has a phase transition at which it suddenly becomes impossible. Namely, if the graph is too sparse, or the probabilistic process that generates it is too noisy, then no algorithm can find a partition that is correlated with the planted one---or even tell if there are communities, i.e., distinguish the graph from a purely random one with high probability. Above this information-theoretic threshold, there is a second threshold beyond which polynomial-time algorithms are known to succeed; in between, there is a regime in which community detection is possible, but conjectured to require exponential time. For computer scientists, this field offers a wealth of new ideas and open questions, with connections to probability and combinatorics, message-passing algorithms, and random matrix theory. Perhaps more importantly, it provides a window into the cultures of statistical physics and statistical inference, and how those cultures think about distributions of instances, landscapes of solutions, and hardness.

[1]  David Kempe,et al.  Modularity-maximizing graph communities via mathematical programming , 2007, 0710.2533.

[2]  Amin Coja-Oghlan,et al.  Charting the Replica Symmetric Phase , 2017, APPROX-RANDOM.

[3]  Emmanuel Abbe,et al.  Crossing the KS threshold in the stochastic block model with information theory , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[4]  Allan Sly,et al.  The number of solutions for random regular NAE-SAT , 2016, Probability Theory and Related Fields.

[5]  Y. Peres,et al.  Broadcasting on trees and the Ising model , 2000 .

[6]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[7]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Jess Banks,et al.  Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[9]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[10]  Varun Kanade,et al.  Global and Local Information in Clustering Labeled Block Models , 2014, IEEE Transactions on Information Theory.

[11]  Michele Leone,et al.  (Un)detectable cluster structure in sparse networks. , 2007, Physical review letters.

[12]  M. Hastings Community detection as an inference problem. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[15]  S. Kak Information, physics, and computation , 1996 .

[16]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Ulrik Brandes,et al.  On Finding Graph Clusterings with Maximum Modularity , 2007, WG.

[18]  M. Mézard,et al.  Reconstruction on Trees and Spin Glass Transition , 2005, cond-mat/0512295.

[19]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[20]  橋本 喜一朗,et al.  Automorphic forms and geometry of arithmetic varieties , 1989 .

[21]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[22]  V. Climenhaga Markov chains and mixing times , 2013 .

[23]  Cristopher Moore,et al.  Random k-SAT: Two Moments Suffice to Cross a Sharp Threshold , 2003, SIAM J. Comput..

[24]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[25]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[26]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[27]  Andrea Montanari,et al.  Gibbs states and the set of solutions of random constraint satisfaction problems , 2006, Proceedings of the National Academy of Sciences.

[28]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[29]  B. Söderberg General formalism for inhomogeneous random graphs. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Nicholas C. Wormald,et al.  Almost All Cubic Graphs Are Hamiltonian , 1992, Random Struct. Algorithms.

[31]  Giorgio Parisi,et al.  SK Model: The Replica Solution without Replicas , 1986 .

[32]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[33]  P. Ronhovde,et al.  Phase transitions in random Potts systems and the community detection problem: spin-glass type and dynamic perspectives , 2010, 1008.2699.

[34]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[35]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[36]  Elchanan Mossel,et al.  Robust reconstruction on trees is determined by the second eigenvalue , 2004, math/0406447.

[37]  Raj Rao Nadakuditi,et al.  Graph spectra and the detectability of community structure in networks , 2012, Physical review letters.

[38]  Emmanuel Abbe,et al.  Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap , 2015, ArXiv.

[39]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[40]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[41]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Joel H. Spencer,et al.  Sudden Emergence of a Giantk-Core in a Random Graph , 1996, J. Comb. Theory, Ser. B.

[43]  Andrea Montanari,et al.  Semidefinite programs on sparse random graphs and their application to community detection , 2015, STOC.

[44]  Alexandra Kolla,et al.  Multisection in the Stochastic Block Model using Semidefinite Programming , 2015, ArXiv.

[45]  H. Kesten,et al.  Additional Limit Theorems for Indecomposable Multidimensional Galton-Watson Processes , 1966 .

[46]  Y. Iba The Nishimori line and Bayesian statistics , 1998, cond-mat/9809190.

[47]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[48]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Assaf Naor,et al.  The two possible values of the chromatic number of a random graph , 2004, STOC '04.

[50]  Emmanuel Abbe,et al.  Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation , 2016, NIPS.

[51]  Elchanan Mossel,et al.  A Spectral Approach to Analysing Belief Propagation for 3-Colouring , 2007, Combinatorics, Probability and Computing.

[52]  Elchanan Mossel,et al.  Local Algorithms for Block Models with Side Information , 2015, ITCS.

[53]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  Cristopher Moore,et al.  The Nature of Computation , 2011 .

[55]  Elchanan Mossel Reconstruction on Trees: Beating the Second Eigenvalue , 2001 .

[56]  Cristopher Moore,et al.  Scalable detection of statistically significant communities and hierarchies, using message passing for modularity , 2014, Proceedings of the National Academy of Sciences.

[57]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[58]  Béla Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007, Random Struct. Algorithms.

[59]  Florent Krzakala,et al.  Information-theoretic thresholds from the cavity method , 2016, STOC.

[60]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[61]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[62]  Andrea Montanari,et al.  Extremal Cuts of Sparse Random Graphs , 2015, ArXiv.

[63]  S. Kirkpatrick,et al.  Solvable Model of a Spin-Glass , 1975 .

[64]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[65]  Adel Javanmard,et al.  Phase transitions in semidefinite relaxations , 2015, Proceedings of the National Academy of Sciences.

[66]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[67]  Lenka Zdeborová,et al.  A conjecture on the maximum cut and bisection width in random regular graphs , 2009, ArXiv.

[68]  Mark Jerrum,et al.  The Metropolis Algorithm for Graph Bisection , 1998, Discret. Appl. Math..

[69]  Elchanan Mossel,et al.  Consistency thresholds for the planted bisection model , 2016 .

[70]  Allan Sly,et al.  Reconstruction for the Potts model , 2009, STOC '09.

[71]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[72]  K. Hashimoto Zeta functions of finite graphs and representations of p-adic groups , 1989 .

[73]  Elchanan Mossel,et al.  Survey: Information Flow on Trees , 2004 .

[74]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[75]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[76]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[77]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[78]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[79]  Allan Sly,et al.  Proof of the Satisfiability Conjecture for Large k , 2014, STOC.