Finite-Time Analysis of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning

Stochastic approximation, a data-driven approach for finding the fixed point of an unknown operator, provides a unified framework for treating many problems in stochastic optimization and reinforcement learning. Motivated by a growing interest in multi-agent and multi-task learning, we consider in this paper a decentralized variant of stochastic approximation. A network of agents, each with their own unknown operator and data observations, cooperatively find the fixed point of the aggregate operator. The agents work by running a local stochastic approximation algorithm using noisy samples from their operators while averaging their iterates with their neighbors' on a decentralized communication graph. Our main contribution provides a finite-time analysis of this decentralized stochastic approximation algorithm and characterizes the impacts of the underlying communication topology between agents. Our model for the data observed at each agent is that it is sampled from a Markov processes; this lack of independence makes the iterates biased and (potentially) unbounded. Under mild assumptions on the Markov processes, we show that the convergence rate of the proposed methods is essentially the same as if the samples were independent, differing only by a log factor that represents the mixing time of the Markov process. We also present applications of the proposed method on a number of interesting learning problems in multi-agent systems, including a decentralized variant of Q-learning for solving multi-task reinforcement learning.

[1]  Wotao Yin,et al.  On Nonconvex Decentralized Gradient Descent , 2016, IEEE Transactions on Signal Processing.

[2]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[3]  Wotao Yin,et al.  On Markov Chain Gradient Descent , 2018, NeurIPS.

[4]  Hoi-To Wai On the Convergence of Consensus Algorithms with Markovian noise and Gradient Bias , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[5]  Mark Jerrum,et al.  The Markov chain Monte Carlo method: an approach to approximate counting and integration , 1996 .

[6]  Martin Jaggi,et al.  A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.

[7]  Anna Semakova,et al.  Decentralized multi-agent tracking of unknown environmental level sets by a team of nonholonomic robots , 2014, 2014 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT).

[8]  Thinh T. Doan,et al.  Local Stochastic Approximation: A Unified View of Federated Learning and Distributed Multi-Task Reinforcement Learning Algorithms , 2020, ArXiv.

[9]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[12]  Thinh T. Doan,et al.  Convergence Rates of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning , 2019, 1902.07393.

[13]  Sihan Zeng,et al.  A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning , 2020, ArXiv.

[14]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[15]  Konstantin Mishchenko,et al.  Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.

[16]  Vivek S. Borkar,et al.  Distributed Reinforcement Learning via Gossip , 2013, IEEE Transactions on Automatic Control.

[17]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[18]  Lam M. Nguyen,et al.  Finite-Time Analysis of Stochastic Gradient Descent under Markov Randomness , 2020, ArXiv.

[19]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[20]  Arijit Raychowdhury,et al.  Autonomous Navigation via Deep Reinforcement Learning for Resource Constraint Edge Nodes Using Transfer Learning , 2019, IEEE Access.

[21]  Joelle Pineau,et al.  Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning , 2019, NeurIPS.

[22]  Qing Liao,et al.  Decentralized Markov Chain Gradient Descent , 2019, ArXiv.

[23]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[24]  Thinh T. Doan,et al.  Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation , 2019, SIAM J. Math. Data Sci..

[25]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[26]  Thinh T. Doan,et al.  Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis , 2019 .