Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available.

[1]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[2]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[3]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[4]  F. Guerra Course 5 – An Introduction to Mean Field Spin Glas Theory: Methods and Results , 2006 .

[5]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[6]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[7]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[8]  Satish Babu Korada,et al.  Exact Solution of the Gauge Symmetric p-Spin Glass Model on a Complete Graph , 2009 .

[9]  Rüdiger L. Urbanke,et al.  Threshold Saturation via Spatial Coupling: Why Convolutional LDPC Ensembles Perform So Well over the BEC , 2010, IEEE Transactions on Information Theory.

[10]  Nicolas Macris,et al.  Coupled graphical models and their thresholds , 2010, 2010 IEEE Information Theory Workshop.

[11]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[12]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[13]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[14]  Sundeep Rangan,et al.  Iterative estimation of constrained rank-one matrices in noise , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[15]  Philippe Rigollet,et al.  Computational Lower Bounds for Sparse PCA , 2013, ArXiv.

[16]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[17]  Henry D. Pfister,et al.  A Simple Proof of Maxwell Saturation for Coupled Scalar Recursions , 2013, IEEE Transactions on Information Theory.

[18]  Florent Krzakala,et al.  Phase transitions in sparse PCA , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[19]  Andrea Montanari,et al.  Finding Hidden Cliques of Size $$\sqrt{N/e}$$N/e in Nearly Linear Time , 2013, Found. Comput. Math..

[20]  Andrea Montanari,et al.  Asymptotic Mutual Information for the Two-Groups Stochastic Block Model , 2015, ArXiv.

[21]  Florent Krzakala,et al.  MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[22]  Nicolas Macris,et al.  Threshold saturation of spatially coupled sparse superposition codes for all memoryless channels , 2016, 2016 IEEE Information Theory Workshop (ITW).

[23]  Florent Krzakala,et al.  Mutual information in rank-one matrix estimation , 2016, 2016 IEEE Information Theory Workshop (ITW).