Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization

We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining low-rank stochastic subgradients with efficient incremental SVD updates, made possible by highly optimized and parallelizable dense linear algebra operations on small matrices. Our practical algorithms always maintain a low-rank factorization of iterates that can be conveniently held in memory and efficiently multiplied to generate predictions in matrix completion settings. Empirical comparisons confirm that our approach is highly competitive with several recently proposed state-of-the-art solvers for such problems.

[1]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[2]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[3]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[4]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[5]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[6]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[7]  J. Borwein,et al.  Techniques of variational analysis , 2005 .

[8]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[9]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[10]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[11]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[12]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[13]  TibshiraniRobert,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010 .

[14]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[15]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[16]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[17]  David F. Gleich,et al.  Tall and skinny QR factorizations in MapReduce architectures , 2011, MapReduce '11.

[18]  James Demmel,et al.  Communication-Avoiding QR Decomposition for GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[19]  Ohad Shamir,et al.  Large-Scale Convex Minimization with a Low-Rank Constraint , 2011, ICML.

[20]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[21]  Andrew Cotter,et al.  Stochastic Optimization for Machine Learning , 2013, ArXiv.

[22]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.