Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization

This article provides an overview of distributed gradient methods for solving convex machine learning problems of the form min<sub><italic>x</italic> ∈ R<sup><italic>n</italic></sup></sub> (1/<italic>m</italic>) ∑<sub><italic>i</italic> = 1</sub><sup><italic>m</italic></sup> <italic>f</italic><sub><italic>i</italic></sub>(<italic>x</italic>) in a system consisting of <italic>m</italic> agents that are embedded in a communication network. Each agent <italic>i</italic> has a collection of data captured by its privately known objective function <italic>f</italic><sub><italic>i</italic></sub>(<italic>x</italic>). The distributed algorithms considered here obey two simple rules: privately known agent functions <italic>f</italic><sub><italic>i</italic></sub>(<italic>x</italic>) cannot be disclosed to any other agent in the network and every agent is aware of the local connectivity structure of the network, i.e., it knows its one-hop neighbors only. While obeying these two rules, the distributed algorithms that agents execute should find a solution to the overall system problem with the limited knowledge of the objective function and limited local communications. Given in this article is an overview of such algorithms that typically involve two update steps: a gradient step based on the agent local objective function and a mixing step that essentially diffuses relevant information from one to all other agents in the network.

[1]  John N. Tsitsiklis,et al.  Distributed subgradient methods and quantization effects , 2008, 2008 47th IEEE Conference on Decision and Control.

[2]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[3]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[4]  Michael G. Rabbat,et al.  Communication/Computation Tradeoffs in Consensus-Based Distributed Optimization , 2012, NIPS.

[5]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[7]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[8]  Asuman E. Ozdaglar,et al.  A fast distributed proximal-gradient method , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  Anit Kumar Sahu,et al.  MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling , 2019, 2019 Sixth Indian Control Conference (ICC).

[10]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[11]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[12]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[13]  Wei Shi,et al.  A Push-Pull Gradient Method for Distributed Optimization in Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[14]  Michael G. Rabbat,et al.  Push-Sum Distributed Dual Averaging for convex optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[15]  Ali Sayed,et al.  Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[16]  M. Degroot Reaching a Consensus , 1974 .

[17]  Wei Shi,et al.  Push–Pull Gradient Methods for Distributed Optimization in Networks , 2021, IEEE Transactions on Automatic Control.

[18]  Nitin H. Vaidya,et al.  Robust Distributed Average Consensus via Exchange of Running Sums , 2016, IEEE Transactions on Automatic Control.

[19]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[20]  R. Srikant,et al.  Quantized Consensus , 2006, 2006 IEEE International Symposium on Information Theory.

[21]  Usman A. Khan,et al.  A Linear Algorithm for Optimization Over Directed Graphs With Geometric Convergence , 2018, IEEE Control Systems Letters.

[22]  Christoforos N. Hadjicostis,et al.  Distributed strategies for average consensus in directed graphs , 2011, IEEE Conference on Decision and Control and European Control Conference.

[23]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[24]  Asuman E. Ozdaglar,et al.  Convergence rate for consensus with delays , 2010, J. Glob. Optim..

[25]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[26]  Angelia Nedic,et al.  Distributed Optimization for Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[27]  Albert S. Berahas,et al.  Balancing Communication and Computation in Distributed Optimization , 2017, IEEE Transactions on Automatic Control.

[28]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[29]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[30]  Jorge Cortés,et al.  Distributed Strategies for Generating Weight-Balanced and Doubly Stochastic Digraphs , 2009, Eur. J. Control.

[31]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[32]  V. Borkar,et al.  Asymptotic agreement in distributed estimation , 1982 .

[33]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[34]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[35]  John N. Tsitsiklis,et al.  Weighted Gossip: Distributed Averaging using non-doubly stochastic matrices , 2010, 2010 IEEE International Symposium on Information Theory.

[36]  Ali H. Sayed,et al.  Adaptive Processing over Distributed Networks , 2007, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[37]  Robert D. Nowak,et al.  Quantized incremental algorithms for distributed optimization , 2005, IEEE Journal on Selected Areas in Communications.

[38]  Christoforos N. Hadjicostis,et al.  Average Consensus in the Presence of Delays in Directed Graph Topologies , 2014, IEEE Transactions on Automatic Control.

[39]  Angelia Nedic,et al.  Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.