论文信息 - Stochastic Dual Coordinate Descent with Bandit Sampling

Stochastic Dual Coordinate Descent with Bandit Sampling

Coordinate descent methods minimize a cost function by updating a single decision variable (corresponding to one coordinate) at a time. Ideally, one would update the decision variable that yields the largest marginal decrease in the cost function. However, finding this coordinate would require checking all of them, which is not computationally practical. We instead propose a new adaptive method for coordinate descent. First, we define a lower bound on the decrease of the cost function when a coordinate is updated and, instead of calculating this lower bound for all coordinates, we use a multi-armed bandit algorithm to learn which coordinates result in the largest marginal decrease while simultaneously performing coordinate descent. We show that our approach improves the convergence of the coordinate methods (including parallel versions) both theoretically and experimentally.

Patrick Thiran | L. Elisa Celis | Farnood Salehi

[1] Martin Jaggi,et al. Primal-Dual Rates and Certificates , 2016, ICML.

[2] Shai Shalev-Shwartz,et al. Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[3] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[4] Peter Richtárik,et al. Importance Sampling for Minibatches , 2016, J. Mach. Learn. Res..

[5] John C. Duchi,et al. Adaptive Sampling Probabilities for Non-Smooth Optimization , 2017, ICML.

[6] Omar Besbes,et al. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards , 2014, Stochastic Systems.

[7] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[8] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[9] Volkan Cevher,et al. Faster Coordinate Descent via Adaptive Importance Sampling , 2017, AISTATS.

[10] Inderjit S. Dhillon,et al. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[11] Peter Richt,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2016 .