Reconstruction and estimation in the planted partition model

The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on $$n$$n nodes with two equal-sized clusters, with an between-class edge probability of $$q$$q and a within-class edge probability of $$p$$p. Although most of the literature on this model has focused on the case of increasing degrees (ie. $$pn, qn \rightarrow \infty $$pn,qn→∞ as $$n \rightarrow \infty $$n→∞), the sparse case $$p, q = O(1/n)$$p,q=O(1/n) is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $$p = a/n$$p=a/n and $$q = b/n$$q=b/n, then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if $$(a - b)^2 > 2(a + b)$$(a-b)2>2(a+b), and impossible if $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b). By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if $$(a - b)^2 > C (a + b)$$(a-b)2>C(a+b) for some sufficiently large $$C$$C. We prove half of their prediction, showing that it is indeed impossible to cluster if $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b). Furthermore we show that it is impossible even to estimate the model parameters from the graph when $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b); on the other hand, we provide a simple and efficient algorithm for estimating $$a$$a and $$b$$b when $$(a - b)^2 > 2(a + b)$$(a-b)2>2(a+b). Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.

[1]  J. L. Hodges,et al.  The Poisson Approximation to the Poisson Binomial Distribution , 1960 .

[2]  H. Kesten,et al.  A Limit Theorem for Multidimensional Galton-Watson Processes , 1966 .

[3]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[4]  David S. Johnson,et al.  Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[7]  Béla Bollobás,et al.  Almost all Regular Graphs are Hamiltonian , 1983, European journal of combinatorics (Print).

[8]  Béla Bollobás,et al.  Random Graphs , 1985 .

[9]  Ravi B. Boppana,et al.  Eigenvalues and graph bisection: An average-case analysis , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10]  Frank Thomson Leighton,et al.  Graph bisection algorithms with good average case behavior , 1984, Comb..

[11]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[12]  Nicholas C. Wormald,et al.  Almost All Cubic Graphs Are Hamiltonian , 1992, Random Struct. Algorithms.

[13]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[14]  J. Ruiz,et al.  On the purity of the limiting gibbs state for the Ising model on the Bethe lattice , 1995 .

[15]  Svante Janson,et al.  Random Regular Graphs: Asymptotic Distributions and Contiguity , 1995, Combinatorics, Probability and Computing.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[18]  Mark Jerrum,et al.  The Metropolis Algorithm for Graph Bisection , 1998, Discret. Appl. Math..

[19]  Nearest-neighbor walks with low predictability profile and percolation in $2+\epsilon$ dimensions , 1998 .

[20]  N. Wormald,et al.  Models of the , 2010 .

[21]  John D. Lamb,et al.  Surveys in combinatorics, 1999 , 1999 .

[22]  Y. Peres,et al.  Broadcasting on trees and the Ising model , 2000 .

[23]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[24]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[25]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[26]  Béla Bollobás,et al.  Random Graphs: Notation , 2001 .

[27]  S. Strogatz Exploring complex networks , 2001, Nature.

[28]  Elchanan Mossel,et al.  Survey: Information Flow on Trees , 2004 .

[29]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[30]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Joel Friedman,et al.  A proof of Alon's second eigenvalue conjecture and related problems , 2004, ArXiv.

[33]  B. Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007 .

[34]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[35]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[36]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[37]  Allan Sly,et al.  Reconstruction for the Potts model , 2009, STOC '09.

[38]  S. Janson Asymptotic equivalence and contiguity of some random graphs , 2008, Random Struct. Algorithms.

[39]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[40]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.