IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling

Variance reduction (VR) techniques for convergence rate acceleration of stochastic gradient descent (SGD) algorithm have been developed with great efforts recently. VR's two variants, stochastic variance-reduced-gradient (SVRG-SGD) and importance sampling (IS-SGD) have achieved remarkable progresses. Meanwhile, asynchronous SGD (ASGD) is becoming more critical due to the ever-increasing scale of the optimization problems. The application of VR in ASGD to accelerate its convergence rate has therefore attracted much interest and SVRG-ASGDs were proposed. However, we found that SVRG suffers dissatisfying performance in accelerating ASGD when datasets are sparse and large-scale. In such case, SVRG-ASGD's iterative computation cost is magnitudes higher than plain ASGD which makes it very inefficient. On the other hand, IS achieves improved convergence rate with few extra computation cost and is invariant to the sparsity of datasets. These advantages make it very suitable for the acceleration of ASGD on large-scale sparse datasets. In this paper we propose a novel IS-combined ASGD for efficient convergence rate acceleration, namely, IS-ASGD. We theoretically prove the superior convergence bound of IS-ASGD. Experimental results also demonstrate our statements.

[1]  Dimitris S. Papailiopoulos,et al.  Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..

[2]  Michael I. Jordan,et al.  Estimation, Optimization, and Parallelism when Data is Sparse , 2013, NIPS.

[3]  Heng Huang,et al.  Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization , 2017, AAAI.

[4]  Tie-Yan Liu,et al.  Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction , 2016, AAAI.

[5]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[6]  Wu-Jun Li,et al.  Fast Asynchronous Parallel Stochastic Gradient Decent , 2015, ArXiv.

[7]  Yuanyuan Liu,et al.  Accelerated Variance Reduced Stochastic ADMM , 2017, AAAI.

[8]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[9]  Zhouchen Lin,et al.  Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization , 2017, AAAI.

[10]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[11]  Alexander J. Smola,et al.  On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.

[12]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.

[13]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[14]  Wu-Jun Li,et al.  Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee , 2016, AAAI.

[15]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[16]  Peter Richtárik,et al.  Importance Sampling for Minibatches , 2016, J. Mach. Learn. Res..