Lifted coordinate descent for learning with trace-norm regularization

We consider the minimization of a smooth loss with trace-norm regularization, which is a natural objective in multi-class and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a non-convex variational bound, which is not guaranteed to converge, or repeatedly perform singular-value decomposition, which prevents scaling beyond moderate matrix sizes. We lift the non-smooth convex problem into an infinitely dimensional smooth problem and apply coordinate descent to solve it. We prove that our approach converges to the optimum, and is competitive or outperforms state of the art.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[3]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[4]  D. Varberg Convex Functions , 1973 .

[5]  G. Jameson Summing and nuclear norms in Banach space theory , 1987 .

[6]  R. Phelps Convex Functions, Monotone Operators and Differentiability , 1989 .

[7]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[8]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[9]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[12]  Zaïd Harchaoui,et al.  A Machine Learning Approach to Conjoint Analysis , 2004, NIPS.

[13]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[14]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[15]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[16]  Dimitri P. Bertsekas,et al.  Nonlinear Programming 2 , 2005 .

[17]  K. Chen,et al.  Matrix preconditioning techniques and applications , 2005 .

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[20]  E. Beckenbach CONVEX FUNCTIONS , 2007 .

[21]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[22]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[23]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[24]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[25]  Francis R. Bach,et al.  Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[26]  Grace Wahba,et al.  LASSO-Patternsearch algorithm with application to ophthalmology and genomic data. , 2006, Statistics and its interface.

[27]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[30]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[31]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[32]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[33]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[34]  Paul Tseng,et al.  Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning , 2010, SIAM J. Optim..

[35]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[36]  Pradeep Ravikumar,et al.  Greedy Algorithms for Structurally Constrained High Dimensional Problems , 2011, NIPS.

[37]  Venkat Chandrasekaran,et al.  Convex optimization methods for graphs and statistical modeling , 2011 .

[38]  Ohad Shamir,et al.  Large-Scale Convex Minimization with a Low-Rank Constraint , 2011, ICML.

[39]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[40]  W. Marsden I and J , 2012 .

[41]  Stephen J. Wright Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..

[42]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.