论文信息 - Asynchronous decentralized convex optimization through short-term gradient averaging

Asynchronous decentralized convex optimization through short-term gradient averaging

This paper considers decentralized convex optimization over a network in large scale contexts, where large simultaneously applies to number of training examples, dimensionality and number of networking nodes. We first propose a cen- tralized optimization scheme that generalizes successful existing methods based on gradient averaging, improving their flexibility by making the number of averaged gradients an explicit parameter of the method. We then propose an asynchronous distributed algorithm that implements this original scheme for large decentralized computing networks.

[1] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[2] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[3] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[4] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[5] Paul Tseng,et al. An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[6] Michael G. Rabbat,et al. Push-Sum Distributed Dual Averaging for convex optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[7] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9] John N. Tsitsiklis,et al. Problems in decentralized decision making and computation , 1984 .

[10] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[11] John N. Tsitsiklis,et al. Weighted Gossip: Distributed Averaging using non-doubly stochastic matrices , 2010, 2010 IEEE International Symposium on Information Theory.

[12] Haimonti Dutta,et al. GADGET SVM: a Gossip-bAseD sub-GradiEnT SVM solver , 2009 .