Asynchronous Stochastic Block Coordinate Descent with Variance Reduction

Asynchronous parallel implementations for stochastic optimization have received huge successes in theory and practice recently. Asynchronous implementations with lock-free are more efficient than the one with writing or reading lock. In this paper, we focus on a composite objective function consisting of a smooth convex function $f$ and a block separable convex function, which widely exists in machine learning and computer vision. We propose an asynchronous stochastic block coordinate descent algorithm with the accelerated technology of variance reduction (AsySBCDVR), which are with lock-free in the implementation and analysis. AsySBCDVR is particularly important because it can scale well with the sample size and dimension simultaneously. We prove that AsySBCDVR achieves a linear convergence rate when the function $f$ is with the optimal strong convexity property, and a sublinear rate when $f$ is with the general convexity. More importantly, a near-linear speedup on a parallel system with shared memory can be obtained.

[1]  D. Callebaut,et al.  Generalization of the Cauchy-Schwarz inequality , 1965 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[4]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[5]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[6]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[7]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[8]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[9]  Kazuhiro Seki,et al.  Block coordinate descent algorithms for large-scale sparse multiclass classification , 2013, Machine Learning.

[10]  Martin Takáč,et al.  Randomized coordinate descent methods for big data optimization , 2014 .

[11]  Haim Avron,et al.  Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization , 2014, IPDPS.

[12]  Yiming Wang,et al.  Accelerated Mini-batch Randomized Block Coordinate Descent Method , 2014, NIPS.

[13]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[14]  Lin Xiao,et al.  An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[15]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[16]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[17]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[18]  Christopher Ré,et al.  Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care , 2015, NIPS.

[19]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[20]  Yijun Huang,et al.  Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[21]  Inderjit S. Dhillon,et al.  PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent , 2015, ICML.

[22]  Alexander J. Smola,et al.  On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.

[23]  Bin Gu,et al.  Decoupled Asynchronous Proximal Stochastic Gradient Descent with Variance Reduction , 2016, ArXiv.

[24]  Cho-Jui Hsieh,et al.  A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.

[25]  Heng Huang,et al.  Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization , 2016, AAAI 2016.

[26]  Heng Huang,et al.  Distributed Asynchronous Dual-Free Stochastic Dual Coordinate Ascent , 2016 .

[27]  Shai Shalev-Shwartz,et al.  SDCA without Duality, Regularization, and Individual Convexity , 2016, ICML.

[28]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[29]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[30]  Wu-Jun Li,et al.  Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee , 2016, AAAI.

[31]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[32]  Dimitris S. Papailiopoulos,et al.  Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..