论文信息 - Stochastic Orthant-Wise Limited-Memory Quasi-Newton Method

Stochastic Orthant-Wise Limited-Memory Quasi-Newton Method

The $\ell_1$-regularized sparse model has been popular in machine learning society. The orthant-wise quasi-Newton (OWL-QN) method is a representative fast algorithm for training the model. However, the proof of the convergence has been pointed out to be incorrect by multiple sources, and up until now, its convergence has not been proved at all. In this paper, we propose a stochastic OWL-QN method for solving $\ell_1$-regularized problems, both with convex and non-convex loss functions. We address technical difficulties that have existed many years. We propose three alignment steps which are generalized from the the original OWL-QN algorithm, to encourage the parameter update be orthant-wise. We adopt several practical features from recent stochastic variants of L-BFGS and the variance reduction method for subsampled gradients. To the best of our knowledge, this is the first orthant-wise algorithms with comparable theoretical convergence rate with stochastic first order algorithms. We prove a linear convergence rate for our algorithm under strong convexity, and experimentally demonstrate that our algorithm achieves state-of-art performance on $\ell_1$ regularized logistic regression and convolutional neural networks.

Jianqiao Wangni

[2] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[3] Jorge Nocedal,et al. A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[4] Zeyuan Allen Zhu,et al. Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[5] S. V. N. Vishwanathan,et al. A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning , 2008, J. Mach. Learn. Res..

[6] Anton Rodomanov,et al. A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums , 2016, ICML.

[7] R. Schnabel. Quasi-Newton Methods Using Multiple Secant Equations. , 1983 .

[8] Michael I. Jordan,et al. A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.

[9] Jacek Gondzio,et al. Action constrained quasi-Newton methods , 2014, ArXiv.

[10] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[11] Paul Tseng,et al. A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[12] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..