Estimation of low-rank matrices via approximate message passing

Consider the problem of estimating a low-rank matrix when its entries are perturbed by Gaussian noise. If the empirical distribution of the entries of the spikes is known, optimal estimators that exploit this knowledge can substantially outperform simple spectral approaches. Recent work characterizes the asymptotic accuracy of Bayes-optimal estimators in the high-dimensional limit. In this paper we present a practical algorithm that can achieve Bayes-optimal accuracy above the spectral threshold. A bold conjecture from statistical physics posits that no polynomial-time algorithm achieves optimal error below the same threshold (unless the best estimator is trivial). Our approach uses Approximate Message Passing (AMP) in conjunction with a spectral initialization. AMP algorithms have proved successful in a variety of statistical estimation tasks, and are amenable to exact asymptotic analysis via state evolution. Unfortunately, state evolution is uninformative when the algorithm is initialized near an unstable fixed point, as often happens in low-rank matrix estimation. We develop a new analysis of AMP that allows for spectral initializations. Our main theorem is general and applies beyond matrix estimation. However, we use it to derive detailed predictions for the problem of estimating a rank-one matrix in noise. Special cases of this problem are closely related---via universality arguments---to the network community detection problem for two asymmetric communities. For general rank-one models, we show that AMP can be used to construct confidence intervals and control false discovery rate. We provide illustrations of the general methodology by considering the cases of sparse low-rank matrices and of block-constant low-rank matrices with symmetric blocks (we refer to the latter as to the `Gaussian Block Model').

[1]  Cristopher Moore,et al.  The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness , 2017, Bull. EATCS.

[2]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, ISIT.

[3]  A. Montanari,et al.  Asymptotic mutual information for the balanced binary stochastic block model , 2016 .

[4]  Nicolas Macris,et al.  Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula , 2016, NIPS.

[5]  C. Donati-Martin,et al.  The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. , 2007, 0706.0136.

[6]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[7]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[8]  C. Villani Optimal Transport: Old and New , 2008 .

[9]  Florent Krzakala,et al.  Mutual information in rank-one matrix estimation , 2016, 2016 IEEE Information Theory Workshop (ITW).

[10]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[11]  Raj Rao Nadakuditi,et al.  The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.

[12]  I. Johnstone High Dimensional Statistical Inference and Random Matrices , 2006, math/0611589.

[13]  Andrea Montanari,et al.  Non-Negative Principal Component Analysis: Message Passing Algorithms and Sharp Asymptotics , 2014, IEEE Transactions on Information Theory.

[14]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[15]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[16]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[17]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[18]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[19]  E. Bolthausen An Iterative Construction of Solutions of the TAP Equations for the Sherrington–Kirkpatrick Model , 2012, 1201.2891.

[20]  Sundeep Rangan,et al.  Iterative estimation of constrained rank-one matrices in noise , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[21]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[22]  S. E. Ahmed,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 2008, Technometrics.

[23]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[24]  Elchanan Mossel,et al.  Density Evolution in the Degree-correlated Stochastic Block Model , 2015, COLT.

[25]  Philip Schniter,et al.  Hyperspectral Unmixing Via Turbo Bilinear Approximate Message Passing , 2015, IEEE Transactions on Computational Imaging.

[26]  Cedric E. Ginestet Spectral Analysis of Large Dimensional Random Matrices, 2nd edn , 2012 .

[27]  Jun Yin,et al.  The Isotropic Semicircle Law and Deformation of Wigner Matrices , 2011, 1110.6449.

[28]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[29]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[30]  A. Guionnet,et al.  Large deviations of the extreme eigenvalues of random deformations of matrices , 2010, Probability Theory and Related Fields.

[31]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[32]  Bradley Efron,et al.  Large-scale inference , 2010 .

[33]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[34]  M. Ledoux The concentration of measure phenomenon , 2001 .

[35]  C. Tracy,et al.  Introduction to Random Matrices , 1992, hep-th/9210073.

[36]  Andrea Montanari,et al.  Universality in Polytope Phase Transitions and Message Passing Algorithms , 2012, ArXiv.

[37]  Jun Yin,et al.  The outliers of a deformed Wigner matrix , 2012, 1207.5619.

[38]  Marc Lelarge,et al.  Fundamental limits of symmetric low-rank matrix estimation , 2016, Probability Theory and Related Fields.

[39]  Andrea Montanari,et al.  Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising , 2011, IEEE Transactions on Information Theory.

[40]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[41]  Léo Miolane Fundamental limits of low-rank matrix estimation , 2017 .

[42]  Volkan Cevher,et al.  Bilinear Generalized Approximate Message Passing—Part I: Derivation , 2013, IEEE Transactions on Signal Processing.

[43]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[44]  Léo Miolane Fundamental limits of low-rank matrix estimation , 2017, 1702.00473.

[45]  Andrea Montanari,et al.  Estimating random variables from random sparse observations , 2007, Eur. Trans. Telecommun..

[46]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[47]  John D. Storey A direct approach to false discovery rates , 2002 .

[48]  Andrea Montanari,et al.  State Evolution for Approximate Message Passing with Non-Separable Functions , 2017, Information and Inference: A Journal of the IMA.

[49]  Ramji Venkataramanan,et al.  Finite-sample analysis of Approximate Message Passing , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[50]  Florent Krzakala,et al.  Phase Transitions and Sample Complexity in Bayes-Optimal Matrix Factorization , 2014, IEEE Transactions on Information Theory.

[51]  I. Johnstone,et al.  Minimax Risk over l p-Balls for l q-error , 1994 .

[52]  Florent Krzakala,et al.  Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications , 2017, ArXiv.

[53]  D. Féral,et al.  The Largest Eigenvalue of Rank One Deformation of Large Wigner Matrices , 2006, math/0605624.

[54]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[55]  Yuxin Chen,et al.  The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences , 2016, Communications on Pure and Applied Mathematics.

[56]  M. Rattray,et al.  Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[57]  Adel Javanmard,et al.  Phase transitions in semidefinite relaxations , 2015, Proceedings of the National Academy of Sciences.

[58]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[59]  Sundeep Rangan,et al.  Iterative Reconstruction of Rank-One Matrices in Noise , 2012 .

[60]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .