Accelerated Gossip via Stochastic Heavy Ball Method

In this paper we show how the stochastic heavy ball method (SHB)–a popular method for solving stochastic convex and non-convex optimization problems–operates as a randomized gossip algorithm. In particular, we focus on two special cases of SHB: the Randomized Kaczmarz method with momentum and its block variant. Building upon a recent framework for the design and analysis of randomized gossip algorithms [20] we interpret the distributed nature of the proposed methods. We present novel protocols for solving the average consensus problem where in each step all nodes of the network update their values but only a subset of them exchange their private values. Numerical experiments on popular wireless sensor networks showing the benefits of our protocols are also presented.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Peter Richtárik,et al.  Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.

[3]  Deanna Needell,et al.  Convergence Properties of the Randomized Extended Gauss-Seidel and Kaczmarz Methods , 2015, SIAM J. Matrix Anal. Appl..

[4]  D. Needell Randomized Kaczmarz solver for noisy linear systems , 2009, 0902.0958.

[5]  Peter Richtárik,et al.  Stochastic Dual Ascent for Solving Linear Systems , 2015, ArXiv.

[6]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[7]  Nikolaos M. Freris,et al.  Randomized gossip algorithms for solving Laplacian systems , 2015, 2015 European Control Conference (ECC).

[8]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[9]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[10]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[11]  Peter Richtárik,et al.  Randomized Iterative Methods for Linear Systems , 2015, SIAM J. Matrix Anal. Appl..

[12]  Soummya Kar,et al.  Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Anand D. Sarwate,et al.  Broadcast Gossip Algorithms for Consensus , 2009, IEEE Transactions on Signal Processing.

[15]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[16]  George Cybenko,et al.  Dynamic Load Balancing for Distributed Memory Multiprocessors , 1989, J. Parallel Distributed Comput..

[17]  Alexandros G. Dimakis,et al.  Geographic Gossip: Efficient Averaging for Sensor Networks , 2007, IEEE Transactions on Signal Processing.

[18]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[19]  John N. Tsitsiklis,et al.  Convergence Speed in Distributed Consensus and Averaging , 2009, SIAM J. Control. Optim..

[20]  Stephen P. Boyd,et al.  A scheme for robust distributed sensor fusion based on average consensus , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[21]  Stephen J. Wright,et al.  An accelerated randomized Kaczmarz algorithm , 2013, Math. Comput..

[22]  M. Degroot Reaching a Consensus , 1974 .

[23]  Yonina C. Eldar,et al.  Acceleration of randomized Kaczmarz method via the Johnson–Lindenstrauss Lemma , 2010, Numerical Algorithms.

[24]  Peter Richtárik,et al.  Privacy preserving randomized gossip algorithms , 2017, 1706.07636.

[25]  D. Needell,et al.  Randomized block Kaczmarz method with projection for solving least squares , 2014, 1403.4192.

[26]  Daniel A. Spielman,et al.  Accelerated Gossip Algorithms for Distributed Computation , 2006 .

[27]  Necdet Serhat Aybat,et al.  Decentralized computation of effective resistances and acceleration of consensus algorithms , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[28]  Nikolaos M. Freris,et al.  Fast distributed smoothing of relative measurements , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[29]  Dirk A. Lorenz,et al.  Linear convergence of the randomized sparse Kaczmarz method , 2016, Mathematical Programming.

[30]  Peter Richtárik,et al.  A new perspective on randomized gossip algorithms , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[31]  Peter Richtárik,et al.  Linearly convergent stochastic heavy ball method for minimizing generalization error , 2017, ArXiv.

[32]  Nathan Srebro,et al.  The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.

[33]  Jun Ye Yu,et al.  Performance comparison of randomized gossip, broadcast gossip and collection tree protocol for distributed averaging , 2013, 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[34]  Alexandros G. Dimakis,et al.  Order-Optimal Consensus Through Randomized Path Averaging , 2010, IEEE Transactions on Information Theory.

[35]  Deanna Needell,et al.  Paved with Good Intentions: Analysis of a Randomized Block Kaczmarz Method , 2012, ArXiv.

[36]  Nikolaos M. Freris,et al.  Randomized Extended Kaczmarz for Solving Least Squares , 2012, SIAM J. Matrix Anal. Appl..

[37]  Alex Olshevsky,et al.  Linear Time Average Consensus on Fixed Graphs and Implications for Decentralized Optimization and Multi-Agent Control , 2014, 1411.4186.

[38]  Peter Richtárik,et al.  Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory , 2017, SIAM J. Matrix Anal. Appl..