A Unified View of Matrix Factorization Models

We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, E-PCA, MMMF, pLSI, pLSI-pHITS, Bregman co-clustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix co-clustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints.

[1]  K. N. Dollman,et al.  - 1 , 1743 .

[2]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[3]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[4]  S. Zamir,et al.  Lower Rank Approximation of Matrices by Least Squares With Any Choice of Weights , 1979 .

[5]  D. Aldous Representations for partially exchangeable arrays of random variables , 1981 .

[6]  D. Aldous Exchangeability and related topics , 1985 .

[7]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[8]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[11]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[12]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[13]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[14]  Manfred K. Warmuth,et al.  Relative Expected Instantaneous Loss Bounds , 2000, J. Comput. Syst. Sci..

[15]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[16]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[17]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[18]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[19]  Geoffrey J. Gordon Generalized^2 Linear^2 Models , 2002, NIPS 2002.

[20]  Geoffrey J. Gordon Generalized2 Linear2 Models , 2002, NIPS.

[21]  Geoffrey J. Gordon Generalized² Linear² Models , 2003, NIPS 2003.

[22]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[23]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[26]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[27]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[28]  Chris H. Q. Ding,et al.  Nonnegative Lagrangian Relaxation of K-Means and Spectral Clustering , 2005, ECML.

[29]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[30]  Aleks Jakulin,et al.  Discrete Component Analysis , 2005, SLSFS.

[31]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[32]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[34]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[35]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method , 2006, AAAI.

[36]  Marko Grobelnik,et al.  Subspace, Latent Structure and Feature Selection, Statistical and Optimization, Perspectives Workshop, SLSFS 2005, Bohinj, Slovenia, February 23-25, 2005, Revised Selected Papers , 2006, SLSFS.

[37]  Philip S. Yu,et al.  Unsupervised learning on k-partite graphs , 2006, KDD '06.

[38]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[39]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[40]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[41]  Geoffrey J. Gordon,et al.  The support vector decomposition machine , 2006, ICML.

[42]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[43]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[44]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.

[45]  Philip S. Yu,et al.  A probabilistic framework for relational clustering , 2007, KDD '07.

[46]  Jason D. M. Rennie Extracting information from informal communication , 2007 .

[47]  R. Koenker,et al.  Regression Quantiles , 2007 .

[48]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[49]  Philip S. Yu,et al.  Relational clustering by symmetric convex coding , 2007, ICML '07.

[50]  Geoffrey J. Gordon,et al.  Closed-form supervised dimensionality reduction with generalized linear models , 2008, ICML '08.

[51]  Max Welling,et al.  Deterministic Latent Variable Models and Their Pitfalls , 2008, SDM.

[52]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.