Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method

Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and proven in few specific cases by a variety of methods. Here, we use the spatial coupling methodology developed in the framework of error correcting codes, to rigorously derive the mutual information for the symmetric rank-one case. We characterize the detectability phase transitions in a large set of estimation problems, where we show that there exists a gap between what currently known polynomial algorithms (in particular spectral methods and approximate message-passing) can do and what is expected information theoretically. Moreover, we show that the computational gap vanishes for the proposed spatially coupled model, a promising feature with many possible applications. Our proof technique has an interest on its own and exploits three essential ingredients: the interpolation method first introduced in statistical physics, the analysis of approximate message-passing algorithms first introduced in compressive sensing, and the theory of threshold saturation for spatially coupled systems first developed in coding theory. Our approach is very generic and can be applied to many other open problems in statistical estimation where heuristic statistical physics predictions are available.

[1]  Florent Krzakala,et al.  Spectral Clustering of graphs with the Bethe Hessian , 2014, NIPS.

[2]  Adel Javanmard,et al.  Performance of a community detection algorithm based on semidefinite programming , 2016, ArXiv.

[3]  Pravesh Kothari,et al.  A Nearly Tight Sum-of-Squares Lower Bound for the Planted Clique Problem , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[4]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[5]  Silvio Franz,et al.  Finite-range spin glasses in the Kac limit: free energy and local observables , 2004 .

[6]  Andrea Montanari,et al.  Asymptotic Mutual Information for the Two-Groups Stochastic Block Model , 2015, ArXiv.

[7]  Nicolas Macris,et al.  The mutual information in random linear estimation , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  Bruce E. Hajek,et al.  Submatrix localization via message passing , 2015, J. Mach. Learn. Res..

[9]  Satish Babu Korada,et al.  Exact Solution of the Gauge Symmetric p-Spin Glass Model on a Complete Graph , 2009 .

[10]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, ISIT.

[11]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[13]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[14]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[15]  Florent Krzakala,et al.  MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[18]  Henry D. Pfister,et al.  A Simple Proof of Maxwell Saturation for Coupled Scalar Recursions , 2013, IEEE Transactions on Information Theory.

[19]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[20]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[21]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[22]  N. Macris,et al.  The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference , 2018, Probability Theory and Related Fields.

[23]  Nicolas Macris,et al.  Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation , 2017, IEEE Transactions on Information Theory.

[24]  Nicolas Macris,et al.  Coupled graphical models and their thresholds , 2010, 2010 IEEE Information Theory Workshop.

[25]  F. Guerra,et al.  The Thermodynamic Limit in Mean Field Spin Glass Models , 2002, cond-mat/0204280.

[26]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[27]  Sundeep Rangan,et al.  Iterative estimation of constrained rank-one matrices in noise , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[28]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[29]  Marc Lelarge,et al.  Fundamental limits of symmetric low-rank matrix estimation , 2016, Probability Theory and Related Fields.

[30]  Florent Krzakala,et al.  Mutual information in rank-one matrix estimation , 2016, 2016 IEEE Information Theory Workshop (ITW).

[31]  Francesco Caltagirone,et al.  Dynamics and termination cost of spatially coupled mean-field models , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Toshiyuki Tanaka,et al.  Low-rank matrix reconstruction and clustering via approximate message passing , 2013, NIPS.

[33]  Nicolas Macris,et al.  Universal Sparse Superposition Codes With Spatial Coupling and GAMP Decoding , 2017, IEEE Transactions on Information Theory.

[34]  Nicolas Macris,et al.  Spatial Coupling as a Proof Technique and Three Applications , 2013, IEEE Transactions on Information Theory.

[35]  Florent Krzakala,et al.  Phase transitions in sparse PCA , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[36]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[37]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[38]  Nicolas Macris,et al.  Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula , 2016, NIPS.

[39]  Mohamad Dia,et al.  High-Dimensional Inference on Dense Graphs with Applications to Coding Theory and Machine Learning , 2018 .

[40]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[41]  Florent Krzakala,et al.  Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation , 2015, NIPS.

[42]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[43]  Rudiger Urbanke,et al.  Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC , 2010, ISIT.