Global analytic solution of fully-observed variational Bayesian matrix factorization

The variational Bayesian (VB) approximation is known to be a promising approach to Bayesian estimation, when the rigorous calculation of the Bayes posterior is intractable. The VB approximation has been successfully applied to matrix factorization (MF), offering automatic dimensionality selection for principal component analysis. Generally, finding the VB solution is a nonconvex problem, and most methods rely on a local search algorithm derived through a standard procedure for the VB approximation. In this paper, we show that a better option is available for fully-observed VBMF--the global solution can be analytically computed. More specifically, the global solution is a reweighted SVD of the observed matrix, and each weight can be obtained by solving a quartic equation with its coefficients being functions of the observed singular value. We further show that the global optimal solution of empirical VBMF (where hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments in multi-variate analysis.

[1]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[2]  Sumio Watanabe Algebraic Geometry and Statistical Learning Theory , 2009 .

[3]  Declan Fleming Try this at home , 2013 .

[4]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[5]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[9]  Shinichi Nakajima,et al.  Perfect Dimensionality Recovery by Variational Bayesian PCA , 2012, NIPS.

[10]  Shinichi Nakajima,et al.  Global Analytic Solution for Variational Bayesian Matrix Factorization , 2010, NIPS.

[11]  Charles M. Bishop Variational principal components , 1999 .

[12]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[13]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[14]  Shinichi Nakajima,et al.  Theoretical Analysis of Bayesian Matrix Factorization , 2011, J. Mach. Learn. Res..

[15]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[16]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[17]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[18]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[19]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[20]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[21]  Luke A. Griffin Encyclopaedia of Mathematics , 2013 .

[22]  Axel Ruhe Perturbation bounds for means of eigenvalues and invariant subspaces , 1970 .

[23]  Shinichi Nakajima,et al.  Sparse Additive Matrix Factorization for Robust PCA and Its Generalization , 2012, ACML.

[24]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[25]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[26]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[27]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[28]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[29]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[30]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[31]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[32]  G. W. STEWARTt ON THE EARLY HISTORY OF THE SINGULAR VALUE DECOMPOSITION * , 2022 .

[33]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[34]  D. Mackay Local Minima, Symmetry-breaking, and Model Pruning in Variational Free Energy Minimization , 2001 .

[35]  S. Puntanen Inequalities: Theory of Majorization and Its Applications, Second Edition by Albert W. Marshall, Ingram Olkin, Barry C. Arnold , 2011 .

[36]  Karl J. Friston,et al.  Characterizing the Response of PET and fMRI Data Using Multivariate Linear Models , 1997, NeuroImage.

[37]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[38]  Yew Jin Lim Variational Bayesian Approach to Movie Rating Prediction , 2007 .

[39]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[40]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[41]  Kurt Hornik,et al.  Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[42]  Shinichi Nakajima,et al.  Global Solution of Fully-Observed Variational Bayesian Matrix Factorization is Column-Wise Independent , 2011, NIPS.

[43]  Guillaume Bouchard,et al.  Fast Variational Bayesian Inference for Non-Conjugate Matrix Factorization Models , 2012, AISTATS.

[44]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[45]  Aggelos K. Katsaggelos,et al.  Sparse Bayesian Methods for Low-Rank Matrix Estimation , 2011, IEEE Transactions on Signal Processing.

[46]  Masashi Sugiyama,et al.  A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices , 2010, ICML.

[47]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[48]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..