Towards Resilience for Multi-Agent QD-Learning

This paper considers the multi-agent reinforcement learning (MARL) problem for a networked (peer-to-peer) system in the presence of Byzantine agents. We build on an existing distributed Q-learning algorithm, and allow certain agents in the network to behave in an arbitrary and adversarial manner (as captured by the Byzantine attack model). Under the proposed algorithm, if the network topology is (2F + 1)robust and up to F Byzantine agents exist in the neighborhood of each regular agent, we establish the almost sure convergence of all regular agents’ value functions to the neighborhood of the optimal value function of all regular agents. For each state, if the optimal Q-values of all regular agents corresponding to different actions are sufficiently separated, our approach allows each regular agent to learn the optimal policy for all regular agents.

[1]  B. Gharesifard,et al.  Distributed Optimization Under Adversarial Nodes , 2016, IEEE Transactions on Automatic Control.

[2]  Rachid Guerraoui,et al.  Byzantine-Tolerant Machine Learning , 2017, ArXiv.

[3]  Shreyas Sundaram,et al.  A Resilient Convex Combination for consensus-based distributed algorithms , 2018, Numerical Algebra, Control & Optimization.

[4]  Qing Ling,et al.  Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Shreyas Sundaram,et al.  Resilient Asymptotic Consensus in Robust Networks , 2013, IEEE Journal on Selected Areas in Communications.

[6]  Adam Wierman,et al.  Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems , 2019, L4DC.

[7]  Waheed Uz Zaman Bajwa,et al.  ByRDiE: Byzantine-Resilient Distributed Coordinate Descent for Decentralized Learning , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[8]  Waheed U. Bajwa,et al.  BRIDGE: Byzantine-Resilient Decentralized Gradient Descent , 2019, IEEE Transactions on Signal and Information Processing over Networks.

[9]  Nitin H. Vaidya,et al.  Matrix Representation of Iterative Approximate Byzantine Consensus in Directed Graphs , 2012, ArXiv.

[10]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[11]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[12]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[13]  Adam Wierman,et al.  Distributed Reinforcement Learning in Multi-Agent Networked Systems , 2020, ArXiv.

[14]  Shreyas Sundaram,et al.  Byzantine-Resilient Distributed Optimization of Multi-Dimensional Functions , 2020, 2020 American Control Conference (ACC).

[15]  Ji Liu,et al.  Toward Resilient Multi-Agent Actor-Critic Algorithms for Distributed Reinforcement Learning , 2020, 2020 American Control Conference (ACC).

[16]  Antonio Bicchi,et al.  Consensus Computation in Unreliable Networks: A System Theoretic Approach , 2010, IEEE Transactions on Automatic Control.

[17]  Jianping He,et al.  Resilient Distributed Optimization Algorithm Against Adversarial Attacks , 2020, IEEE Transactions on Automatic Control.