SUCAG: Stochastic Unbiased Curvature-aided Gradient Method for Distributed Optimization

We propose and analyze a new stochastic gradient method, which we call Stochastic Unbiased Curvature-aided Gradient (SUCAG), for finite sum optimization problems. SUCAG constitutes an unbiased total gradient tracking technique that uses Hessian information to accelerate convergence. We analyze our method under the general asynchronous model of computation, in which each function is selected infinitely often with possibly unbounded (but sublinear) delay. For strongly convex problems, we establish linear convergence for the SUCAG method. When the initialization point is sufficiently close to the optimal solution, the established convergence rate is only dependent on the condition number of the problem, making it strictly faster than the known rate for the SAGA method. Furthermore, we describe a Markov-driven approach of implementing the SUCAG method in a distributed asynchronous multi-agent setting, via gossiping along a random walk on an undirected communication graph. We show that our analysis applies as long as the graph is connected and, notably, establishes an asymptotic linear convergence rate that is robust to the graph topology. Numerical results demonstrate the merits of our algorithm over existing methods.

[1]  Panganamala Ramana Kumar,et al.  Fundamental Limits on Synchronizing Clocks Over Networks , 2011, IEEE Transactions on Automatic Control.

[2]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[3]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[4]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[5]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[6]  Mikael Johansson,et al.  A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..

[7]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[8]  R. Srikant,et al.  A tutorial on cross-layer optimization in wireless networks , 2006, IEEE Journal on Selected Areas in Communications.

[9]  Pantelis Sopasakis,et al.  Accelerated reconstruction of a compressively sampled data stream , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[10]  H. Robbins A Stochastic Approximation Method , 1951 .

[11]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[12]  Panganamala Ramana Kumar,et al.  Fundamentals of Large Sensor Networks: Connectivity, Capacity, Clocks, and Computation , 2009, Proceedings of the IEEE.

[13]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[14]  Wei Shi,et al.  Curvature-aided incremental aggregated gradient method , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[16]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[17]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[18]  Angelia Nedic,et al.  Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[19]  Martin Vetterli,et al.  Compressed sensing of streaming data , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[21]  Nicolas Le Roux,et al.  Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods , 2017, AISTATS.

[22]  Panganamala Ramana Kumar,et al.  Cyber–Physical Systems: A Perspective at the Centennial , 2012, Proceedings of the IEEE.

[23]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[24]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[25]  Nikolaos M. Freris,et al.  Fast distributed smoothing of relative measurements , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[26]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[27]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .