Fast Variational Bayesian Inference for Non-Conjugate Matrix Factorization Models

Probabilistic matrix factorization methods aim to extract meaningful correlation structure from an incomplete data matrix by postulating low rank constraints. Recently, variational Bayesian (VB) inference techniques have successfully been applied to such large scale bilinear models. However, current algorithms are of the alternate updating or stochastic gradient descent type, slow to converge and prone to getting stuck in shallow local minima. While for MAP or maximum margin estimation, singular value shrinkage algorithms have been proposed which can far outperform alternate updating, this methodological avenue remains unexplored for Bayesian techniques. In this paper, we show how to combine a recent singular value shrinkage characterization of fully observed spherical Gaussian VB matrix factorization with local variational bounding in order to obtain efficient VB inference for general MF models with non-conjugate likelihood potentials. In particular, we show how to handle Poisson and Bernoulli potentials, far more suited for most MF applications than Gaussian likelihoods. Our algorithm can be run even for very large models and is easily implemented in {\em Matlab}. It exhibits significantly better prediction performance than MAP estimation on a range of real-world datasets.

[1]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[2]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[3]  Guillaume Bouchard,et al.  Robust Bayesian Matrix Factorisation , 2011, AISTATS.

[4]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[5]  Masashi Sugiyama,et al.  A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices , 2010, ICML.

[6]  Neil H. Timm,et al.  Multivariate Reduced-Rank Regression , 1999, Technometrics.

[7]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[8]  Mohammad Emtiyaz Khan,et al.  Variational bounds for mixed-data factor analysis , 2010, NIPS.

[9]  Magnus Rattray,et al.  Inference algorithms and learning theory for Bayesian sparse factor analysis , 2009 .

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  Bradley N. Miller,et al.  Applying Collaborative Filtering to Usenet News , 1997 .

[12]  Eero P. Simoncelli,et al.  Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Encoding Model , 2004, Neural Computation.

[13]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[14]  Shinichi Nakajima,et al.  Theoretical Analysis of Bayesian Matrix Factorization , 2011, J. Mach. Learn. Res..

[15]  Zaïd Harchaoui,et al.  A Machine Learning Approach to Conjoint Analysis , 2004, NIPS.

[16]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[17]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[18]  Michael I. Jordan,et al.  Variational methods for inference and estimation in graphical models , 1997 .

[19]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[20]  Juha Karhunen,et al.  Principal Component Analysis for Large Scale Problems with Lots of Missing Values , 2007, ECML.

[21]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[22]  Yee Whye Teh,et al.  Variational Bayesian Approach to Movie Rating Prediction , 2007, KDD 2007.