Decoding Binary Node Labels from Censored Edge Measurements: Phase Transition and Efficient Recovery

We consider the problem of clustering a graph G into two communities by observing a subset of the vertex correlations. Specifically, we consider the inverse problem with observed variables Y = B<sub>G</sub>x ⊕ Z, where B<sub>G</sub> is the incidence matrix of a graph G, x is the vector of unknown vertex variables (with a uniform prior), and Z is a noise vector with Bernoulli (ε) i.i.d. entries. All variables and operations are Boolean. This model is motivated by coding, synchronization, and community detection problems. In particular, it corresponds to a stochastic block model or a correlation clustering problem with two communities and censored edges. Without noise, exact recovery (up to global flip) of x is possible if and only the graph G is connected, with a sharp threshold at the edge probability log (n)/n for Erdos-Renyi random graphs. The first goal of this paper is to determine how the edge probabilityp needs to scale to allow exact recovery in the presence of noise. Defining the degree rate of the graph by α = np/log(n), it is shown that exact recovery is possible if and only if α > 2/(1 - 2ε)<sup>2</sup> + o(1/(1 - 2ε)<sup>2</sup>). In other words, 2/(1 - 2ε)<sup>2</sup> is the information theoretic threshold for exact recovery at low-SNR. In addition, an efficient recovery algorithm based on semidefinite programming is proposed and shown to succeed in the threshold regime up to twice the optimal rate. For a deterministic graph G, defining the degree rate as α = d/log(n), where d is the minimum degree of the graph, it is shown that the proposed method achieves the rate α > 4((1 + λ)/(1 - λ)<sup>2</sup>/(1 - 2ε)<sup>2</sup> + o(1/(1 - 2ε)<sup>2</sup>), where 1-λ is the spectral gap of the graph G.

[1]  Amin Shokrollahi,et al.  LT-Codes and Phase Transitions for Mutual Information - (Invited Talk) , 2011, ICITS.

[2]  Amit Singer,et al.  Approximating the Little Grothendieck Problem over the Unitary and Orthogonal Groups , 2013 .

[3]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[4]  S. Boorman,et al.  Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions , 1976, American Journal of Sociology.

[5]  Vincent D. Blondel,et al.  Cramér-Rao bounds for synchronization of rotations , 2012, ArXiv.

[6]  A. Singer Angular Synchronization by Eigenvectors and Semidefinite Programming. , 2009, Applied and computational harmonic analysis.

[7]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[8]  Noga Alon,et al.  lambda1, Isoperimetric inequalities for graphs, and superconcentrators , 1985, J. Comb. Theory, Ser. B.

[9]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[10]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[12]  Anton van den Hengel,et al.  Semidefinite Programming , 2014, Computer Vision, A Reference Guide.

[13]  Tiefeng Jiang,et al.  Low eigenvalues of Laplacian matrices of large random graphs , 2012 .

[14]  E. Slud Distribution Inequalities for the Binomial Law , 1977 .

[15]  Carles Padró,et al.  Information Theoretic Security , 2013, Lecture Notes in Computer Science.

[16]  Anthony Man-Cho So,et al.  Probabilistic analysis of the semidefinite relaxation detector in digital communications , 2010, SODA '10.

[17]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Andrea Montanari,et al.  Conditional Random Fields, Planted Constraint Satisfaction and Entropy Concentration , 2013, APPROX-RANDOM.

[19]  Venkatesan Guruswami,et al.  Correlation clustering with a fixed number of clusters , 2005, SODA '06.

[20]  Serge Fehr,et al.  Information Theoretic Security , 2009, Found. Trends Commun. Inf. Theory.

[21]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[22]  P. Pakzad,et al.  Phase Transitions for Mutual Information , 2010, 2010 6th International Symposium on Turbo Codes & Iterative Information Processing.

[23]  Noga Alon,et al.  Approximating the cut-norm via Grothendieck's inequality , 2004, STOC '04.

[24]  Joel Friedman,et al.  A proof of Alon's second eigenvalue conjecture and related problems , 2004, ArXiv.

[25]  Claudio Gentile,et al.  A Linear Time Active Learning Algorithm for Link Classification , 2012, NIPS.

[26]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[27]  Andrea J. Goldsmith,et al.  Information Recovery From Pairwise Measurements , 2015, IEEE Transactions on Information Theory.

[28]  Amit Singer,et al.  Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery , 2014, 2014 IEEE International Symposium on Information Theory.

[29]  Laurent Massoulié,et al.  Community Detection in the Labelled Stochastic Block Model , 2012, ArXiv.

[30]  S. Boorman,et al.  Social structure from multiple networks: I , 1976 .

[31]  N. Alon,et al.  il , , lsoperimetric Inequalities for Graphs , and Superconcentrators , 1985 .

[32]  Amit Singer,et al.  Exact and Stable Recovery of Rotations for Robust Synchronization , 2012, ArXiv.

[33]  Vladimir Batagelj,et al.  Generalized Blockmodeling (Structural Analysis in the Social Sciences) , 2004 .

[34]  Noga Alon,et al.  Eigenvalues and expanders , 1986, Comb..

[35]  Dustin G. Mixon,et al.  Phase Retrieval with Polarization , 2012, SIAM J. Imaging Sci..

[36]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[37]  Amit Singer,et al.  A Cheeger Inequality for the Graph Connection Laplacian , 2012, SIAM J. Matrix Anal. Appl..

[38]  Leonidas J. Guibas,et al.  Near-Optimal Joint Object Matching via Convex Relaxation , 2014, ICML.

[39]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[40]  Leonidas J. Guibas,et al.  Consistent Shape Maps via Semidefinite Programming , 2013, SGP '13.

[41]  Amit Singer,et al.  Approximating the little Grothendieck problem over the orthogonal and unitary groups , 2013, Mathematical Programming.

[42]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[43]  Doron Puder,et al.  Expansion of random graphs: new proofs, new results , 2012, 1212.5216.