Online Learning in the Embedded Manifold of Low-rank Matrices

When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O((n+m)k) for a rank-k matrix of dimension m×n, when using an online procedure with rank-one gradients. We use this algorithm, LORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt LORETA to learn positive semi-definite low-rank matrices, providing an online algorithm for low-rank metric learning. LORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multilabel image classification task.

[1]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[2]  Jérôme Malick,et al.  Projection-like Retractions on Matrix Manifolds , 2012, SIAM J. Optim..

[3]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Liu Yang An Overview of Distance Metric Learning , 2007 .

[6]  Stefan Vandewalle,et al.  A Riemannian Optimization Approach for Computing Low-Rank Solutions of Lyapunov Equations , 2010, SIAM J. Matrix Anal. Appl..

[7]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[8]  Sabine Van Huffel,et al.  Best Low Multilinear Rank Approximation of Higher-Order Tensors, Based on the Riemannian Trust-Region Scheme , 2011, SIAM J. Matrix Anal. Appl..

[9]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[10]  Yanjun Qi,et al.  Supervised semantic indexing , 2009, ECIR.

[11]  Inderjit S. Dhillon,et al.  Rank minimization via online learning , 2008, ICML '08.

[12]  I. Holopainen Riemannian Geometry , 1927, Nature.

[13]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[14]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[15]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[16]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[17]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[18]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[19]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[20]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[21]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[22]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[24]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[25]  Berkant Savas,et al.  A Newton-Grassmann Method for Computing the Best Multilinear Rank-(r1, r2, r3) Approximation of a Tensor , 2009, SIAM J. Matrix Anal. Appl..

[26]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[27]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[28]  J. Meyer Generalized Inversion of Modified Matrices , 1973 .

[29]  Bart Vandereycken,et al.  Embedded geometry of the set of symmetric positive semidefinite matrices of fixed rank , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[30]  Daphna Weinshall,et al.  Online Learning in The Manifold of Low-Rank Matrices , 2010, NIPS.

[31]  Stephen J. Wright,et al.  Active Set Identification in Nonlinear Programming , 2006, SIAM J. Optim..

[32]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[33]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[34]  Frank Vallentin,et al.  The Grothendieck problem with rank constraint , 2010 .

[35]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[36]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[37]  Stephen P. Boyd,et al.  Rank minimization and applications in system theory , 2004, Proceedings of the 2004 American Control Conference.

[38]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[39]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[40]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[41]  Silvere Bonnabel,et al.  Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach , 2010, J. Mach. Learn. Res..

[42]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.