论文信息 - Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization - 字舞流文

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asynchronous SGD with variance reduction technique on non-convex problem. In this paper, we study two asynchronous parallel implementations of SVRG: one is on a distributed memory system and the other is on a shared memory system. We provide the theoretical analysis that both algorithms can obtain a convergence rate of $O(1/T)$, and linear speed up is achievable if the number of workers is upper bounded. V1,v2,v3 have been withdrawn due to reference issue, please refer the newest version v4.

Heng Huang | Zhouyuan Huo | Zhouyuan Huo | Heng Huang

[1] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[3] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[4] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[5] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[6] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[7] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[8] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[9] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[10] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[11] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[12] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[13] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[14] Stephen J. Wright,et al. An Asynchronous Parallel Randomized Kaczmarz Algorithm , 2014, ArXiv.

[15] James T. Kwok,et al. Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.

[16] James T. Kwok,et al. Fast Distributed Asynchronous SGD with Variance Reduction , 2015, ArXiv.

[17] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[18] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[19] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.

[20] Bin Gu,et al. Decoupled Asynchronous Proximal Stochastic Gradient Descent with Variance Reduction , 2016, ArXiv.

[21] Bin Gu,et al. Asynchronous Stochastic Block Coordinate Descent with Variance Reduction , 2016, 1610.09447.

[22] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[23] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[24] Heng Huang,et al. Distributed Asynchronous Dual-Free Stochastic Dual Coordinate Ascent , 2016 .

[25] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[26] Wu-Jun Li,et al. Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee , 2016, AAAI.

[27] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[28] Dimitris S. Papailiopoulos,et al. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..

[29] Mingyi Hong,et al. A Distributed, Asynchronous, and Incremental Algorithm for Nonconvex Optimization: An ADMM Approach , 2014, IEEE Transactions on Control of Network Systems.