Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning

This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, each agent only obtains a local reward, while all agents share a common environmental state. To optimize the global return as the sum of local return, the agents exchange information with their neighbors through a communication network. The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed to solve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergence of the proposed algorithm is established.

[1]  Zhuoran Yang,et al.  Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.

[2]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[3]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[5]  Aryan Mokhtari,et al.  DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Signal Processing.

[6]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[7]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[8]  Ziyang Meng,et al.  A survey of distributed optimization , 2019, Annu. Rev. Control..

[9]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Xiangfeng Wang,et al.  Asynchronous Distributed ADMM for Large-Scale Optimization—Part II: Linear Convergence Analysis and Numerical Performance , 2015, IEEE Transactions on Signal Processing.

[12]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[13]  Ana L. C. Bazzan,et al.  A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems , 2019, Expert Syst. Appl..

[14]  Tamer Basar,et al.  Networked Multi-Agent Reinforcement Learning in Continuous Spaces , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[15]  Naira Hovakimyan,et al.  Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[16]  Yongqiang Wang,et al.  ADMM Based Privacy-Preserving Decentralized Optimization , 2017, IEEE Transactions on Information Forensics and Security.

[17]  Tie-Yan Liu,et al.  A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network , 2019, AAMAS.

[18]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[19]  H. Vincent Poor,et al.  QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..

[20]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[21]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Xiangfeng Wang,et al.  Asynchronous Distributed ADMM for Large-Scale Optimization—Part I: Algorithm and Convergence Analysis , 2015, IEEE Transactions on Signal Processing.

[23]  H. Vincent Poor,et al.  Distributed reinforcement learning in multi-agent networks , 2013, 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[24]  Vivek S. Borkar,et al.  Distributed Reinforcement Learning via Gossip , 2013, IEEE Transactions on Automatic Control.

[25]  Arumugam Nallanathan,et al.  Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks , 2018, IEEE Transactions on Wireless Communications.

[26]  Jan Peters,et al.  Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[27]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.