Distributed Asynchronous Stochastic Dual Coordinate Ascent without Duality

In this paper, we propose new Distributed Asynchronous Dual -Free Coordinate Ascent method (Asy-df SDCA), and provide the proof of conver gence rate for two cases: the individual loss is convex and the individual loss is non-convex but its expected loss is convex. Stochastic Dual Coordinate Ascent (SDCA) model is a popular method and often has better performances than stoch asti gradient descent methods in solving regularized convex loss minimization pr oblems. Dual-Free Stochastic Dual Coordinate Ascent method is a variation of S DCA, and can be applied to non-convex problem when its dual problem is meani ngless. We extend Dual-Free Stochastic Dual Coordinate Ascent method to the d istributed mode with considering the star network in this paper.

[1]  Mingyi Hong,et al.  A Distributed, Asynchronous, and Incremental Algorithm for Nonconvex Optimization: An ADMM Approach , 2014, IEEE Transactions on Control of Network Systems.

[2]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[3]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[4]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[5]  Stephen J. Wright,et al.  An Asynchronous Parallel Randomized Kaczmarz Algorithm , 2014, ArXiv.

[6]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[7]  James T. Kwok,et al.  Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.

[8]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[9]  Shai Shalev-Shwartz,et al.  Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[10]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[11]  Heng Huang,et al.  Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization , 2016, AAAI 2016.

[12]  Yijun Huang,et al.  Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[13]  Shai Shalev-Shwartz,et al.  SDCA without Duality , 2015, ArXiv.

[14]  Peter Richtárik,et al.  Distributed Mini-Batch SDCA , 2015, ArXiv.

[15]  Avleen Singh Bijral,et al.  Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[16]  James T. Kwok,et al.  Fast Distributed Asynchronous SGD with Variance Reduction , 2015, ArXiv.

[17]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[18]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[19]  Tianbao Yang,et al.  Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent , 2013, NIPS.

[20]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.