Accelerated Coordinate Descent with Adaptive Coordinate Frequencies

Coordinate descent (CD) algorithms have become the method of choice for solving a number of machine learning tasks. They are particularly popular for training linear models, includ- ing linear support vector machine classication, LASSO regression, and logistic regression. We propose an extension of the CD algorithm, called the adaptive coordinate frequencies (ACF) method. This modied CD scheme does not treat all coordinates equally, in that it does not pick all coordinates equally often for optimization. Instead the relative frequen- cies of coordinates are subject to online adaptation. The resulting optimization scheme can result in signicant speed-ups. We demonstrate the usefulness of our approach on a number of large scale machine learning problems.

[1]  L. Bottou,et al.  1 Support Vector Machine Solvers , 2007 .

[2]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[3]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[4]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[5]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[8]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[9]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[10]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[11]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.

[12]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[13]  Michèle Sebag,et al.  Adaptive coordinate descent , 2011, GECCO '11.

[14]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[15]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[16]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[17]  Don R. Hush,et al.  Training SVMs Without Offset , 2011, J. Mach. Learn. Res..

[18]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.