Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications

This article is an extended version of previous work of the authors [40, 41] on low-rank matrix estimation in the presence of constraints on the factors into which the matrix is factorized. Low-rank matrix factorization is one of the basic methods used in data analysis for unsupervised learning of relevant features and other types of dimensionality reduction. We present a framework to study the constrained low-rank matrix estimation for a general prior on the factors, and a general output channel through which the matrix is observed. We draw a paralel with the study of vector-spin glass models - presenting a unifying way to study a number of problems considered previously in separate statistical physics works. We present a number of applications for the problem in data analysis. We derive in detail a general form of the low-rank approximate message passing (Low- RAMP) algorithm, that is known in statistical physics as the TAP equations. We thus unify the derivation of the TAP equations for models as different as the Sherrington-Kirkpatrick model, the restricted Boltzmann machine, the Hopfield model or vector (xy, Heisenberg and other) spin glasses. The state evolution of the Low-RAMP algorithm is also derived, and is equivalent to the replica symmetric solution for the large class of vector-spin glass models. In the section devoted to result we study in detail phase diagrams and phase transitions for the Bayes-optimal inference in low-rank matrix estimation. We present a typology of phase transitions and their relation to performance of algorithms such as the Low-RAMP or commonly used spectral methods.

[1]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[2]  S. Kirkpatrick,et al.  Solvable Model of a Spin-Glass , 1975 .

[3]  D. Thouless,et al.  Spherical Model of a Spin-Glass , 1976 .

[4]  R. Palmer,et al.  Solution of 'Solvable model of a spin glass' , 1977 .

[5]  G. Toulouse,et al.  Coexistence of Spin-Glass and Ferromagnetic Orderings , 1981 .

[6]  H. Sommers Theory of a Heisenberg spin glass , 1981 .

[7]  T. Plefka Convergence condition of the TAP equation for the infinite-ranged Ising spin glass model , 1982 .

[8]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[9]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Kanter,et al.  Mean-field theory of the Potts glass. , 1985, Physical review letters.

[11]  Giorgio Parisi,et al.  SK Model: The Replica Solution without Replicas , 1986 .

[12]  J. Yedidia,et al.  How to expand around mean-field theory using high-temperature expansions , 1991 .

[13]  J. Nadal,et al.  Optimal unsupervised learning , 1994 .

[14]  Michael Biehl,et al.  Statistical mechanics of unsupervised structure recognition , 1994 .

[15]  Sompolinsky,et al.  Statistical mechanics of the maximum-likelihood density estimation. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[16]  Hilbert J. Kappen,et al.  Boltzmann Machine Learning Using Mean Field Theory and Linear Response Correction , 1997, NIPS.

[17]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[18]  H. Nishimori Statistical Physics of Spin Glasses and Information Processing , 2001 .

[19]  D. Sherrington,et al.  Absence of replica symmetry breaking in a region of the phase diagram of the Ising spin glass , 2000, cond-mat/0008139.

[20]  西森 秀稔 Statistical physics of spin glasses and information processing : an introduction , 2001 .

[21]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[22]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[23]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[24]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[26]  M. Rattray,et al.  Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[28]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[29]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[30]  A. Montanari,et al.  Rigorous Inequalities Between Length and Time Scales in Glassy Systems , 2006, cond-mat/0603018.

[31]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[32]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[33]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[34]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[35]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[36]  Sundeep Rangan,et al.  Estimation with random linear mixing, belief propagation and compressed sensing , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[37]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, ISIT.

[38]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[40]  Sundeep Rangan,et al.  Iterative estimation of constrained rank-one matrices in noise , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[41]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[42]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[43]  Andrea Montanari,et al.  Finding Hidden Cliques of Size \sqrt{N/e} in Nearly Linear Time , 2013, ArXiv.

[44]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[45]  Philippe Rigollet,et al.  Computational Lower Bounds for Sparse PCA , 2013, ArXiv.

[46]  Florent Krzakala,et al.  Phase diagram and approximate message passing for blind calibration and dictionary learning , 2013, 2013 IEEE International Symposium on Information Theory.

[47]  Toshiyuki Tanaka,et al.  Low-rank matrix reconstruction and clustering via approximate message passing , 2013, NIPS.

[48]  Volkan Cevher,et al.  Fixed Points of Generalized Approximate Message Passing With Arbitrary Matrices , 2016, IEEE Transactions on Information Theory.

[49]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[50]  Andrea Montanari,et al.  A statistical model for tensor PCA , 2014, NIPS.

[51]  Florent Krzakala,et al.  Variational free energies for compressed sensing , 2014, 2014 IEEE International Symposium on Information Theory.

[52]  Volkan Cevher,et al.  Bilinear Generalized Approximate Message Passing—Part I: Derivation , 2013, IEEE Transactions on Signal Processing.

[53]  Florent Krzakala,et al.  On convergence of approximate message passing , 2014, 2014 IEEE International Symposium on Information Theory.

[54]  Florent Krzakala,et al.  Phase transitions in sparse PCA , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[55]  Andrea Montanari,et al.  Finding Hidden Cliques of Size $$\sqrt{N/e}$$N/e in Nearly Linear Time , 2013, Found. Comput. Math..

[56]  Sundeep Rangan,et al.  Adaptive damping and mean removal for the generalized approximate message passing algorithm , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57]  Florent Krzakala,et al.  Training Restricted Boltzmann Machines via the Thouless-Anderson-Palmer Free Energy , 2015, NIPS 2015.

[58]  Florent Krzakala,et al.  Statistical physics of inference: thresholds and algorithms , 2015, ArXiv.

[59]  Florent Krzakala,et al.  MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[60]  R. Monasson,et al.  Estimating the principal components of correlation matrices from all their empirical eigenvectors , 2015, 1503.00287.

[61]  Andrea Montanari,et al.  Finding One Community in a Sparse Graph , 2015, Journal of Statistical Physics.

[62]  B. Nadler,et al.  DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[63]  Florent Krzakala,et al.  Inferring sparsity: Compressed sensing using generalized restricted Boltzmann machines , 2016, 2016 IEEE Information Theory Workshop (ITW).

[64]  Marc Lelarge,et al.  Recovering Asymmetric Communities in the Stochastic Block Model , 2018, IEEE Transactions on Network Science and Engineering.

[65]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[66]  Florent Krzakala,et al.  Mutual information in rank-one matrix estimation , 2016, 2016 IEEE Information Theory Workshop (ITW).

[67]  Andrea Montanari,et al.  Asymptotic mutual information for the binary stochastic block model , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[68]  Jess Banks,et al.  Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[69]  Ankur Moitra,et al.  Message‐Passing Algorithms for Synchronization Problems over Compact Groups , 2016, ArXiv.

[70]  Sebastian Fischer,et al.  Exploring Artificial Intelligence In The New Millennium , 2016 .

[71]  Andrea Montanari,et al.  Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..

[72]  Florent Krzakala,et al.  Phase Transitions and Sample Complexity in Bayes-Optimal Matrix Factorization , 2014, IEEE Transactions on Information Theory.

[73]  Ankur Moitra,et al.  Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization , 2016, ArXiv.

[74]  Nicolas Macris,et al.  Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula , 2016, NIPS.

[75]  Léo Miolane Fundamental limits of low-rank matrix estimation , 2017 .

[76]  Rémi Monasson,et al.  Emergence of Compositional Representations in Restricted Boltzmann Machines , 2016, Physical review letters.

[77]  M. Mézard Mean-field message-passing equations in the Hopfield model and its generalizations. , 2016, Physical review. E.

[78]  Marc Lelarge,et al.  Fundamental limits of symmetric low-rank matrix estimation , 2016, Probability Theory and Related Fields.

[79]  Nicolas Macris,et al.  Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method , 2018, ArXiv.

[80]  Jess Banks,et al.  Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).