Global and Local Information in Clustering Labeled Block Models

The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore, and Zdeborová, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman, and Sly (2012), and more recently, the positive direction was independently proved by Massoulié and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information, efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).

[1]  V. Climenhaga Markov chains and mixing times , 2013 .

[2]  Y. Peres,et al.  Broadcasting on trees and the Ising model , 2000 .

[3]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[4]  Mikko Alava,et al.  Branching Processes , 2009, Encyclopedia of Complexity and Systems Science.

[5]  Florent Krzakala,et al.  Comparative study for inference of hidden classes in stochastic block models , 2012, ArXiv.

[6]  Elchanan Mossel,et al.  Information flow on trees , 2001, math/0107033.

[7]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[8]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[9]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[11]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[12]  Armen E. Allahverdyan,et al.  Community detection with and without prior information , 2009, ArXiv.

[13]  Armen E. Allahverdyan,et al.  Phase Transitions in Community Detection: A Solvable Toy Model , 2013, ArXiv.

[14]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[15]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[16]  B. Szegedy,et al.  Limits of locally–globally convergent graph sequences , 2012, Geometric and Functional Analysis.

[17]  Allan Sly,et al.  Reconstruction for the Potts model , 2009, STOC '09.

[18]  Elchanan Mossel,et al.  Survey: Information Flow on Trees , 2004 .

[19]  Madhu Sudan,et al.  Limits of local algorithms over sparse random graphs , 2013, ITCS.

[20]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[21]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[22]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[23]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[25]  Allan Sly Reconstruction of symmetric Potts Models , 2008, 0811.1208.

[26]  Mark Jerrum,et al.  The Metropolis Algorithm for Graph Bisection , 1998, Discret. Appl. Math..

[27]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[28]  Elchanan Mossel Reconstruction on Trees: Beating the Second Eigenvalue , 2001 .

[29]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[30]  Fedor Nazarov,et al.  Perfect matchings as IID factors on non-amenable groups , 2009, Eur. J. Comb..

[31]  Varun Kanade,et al.  Global and Local Information in Clustering Labeled Block Models , 2016, IEEE Trans. Inf. Theory.