论文信息 - Multi-Agent Trust Region Policy Optimization

Multi-Agent Trust Region Policy Optimization

We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By making a series of approximations to the consensus optimization model, we propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO). This algorithm can optimize distributed policies based on local observations and private rewards. The agents do not need to know observations, rewards, policies or value/action-value functions of other agents. The agents only share a likelihood ratio with their neighbors during the training process. The algorithm is fully decentralized and privacy-preserving. Our experiments on two cooperative games demonstrate its robust performance on complicated MARL tasks.

Haibo He | Hepeng Li | Haibo He | Hepeng Li

[1] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[2] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[3] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[4] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5] Liangjun Ke,et al. Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning , 2019, IEEE Transactions on Cybernetics.

[6] Zhuoran Yang,et al. Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.

[7] Nicholas R. Jennings,et al. On Agent-Mediated Electronic Commerce , 2003, IEEE Trans. Knowl. Data Eng..

[8] J. Such,et al. A survey of privacy in multi-agent systems , 2013, The Knowledge Engineering Review.

[9] Hongyuan Zha,et al. F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning , 2020, ArXiv.

[10] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[11] Xiangyu Liu,et al. Attentive Relational State Representation in Decentralized Multiagent Reinforcement Learning , 2020, IEEE Transactions on Cybernetics.

[12] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[13] Naira Hovakimyan,et al. Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[14] Vivek S. Borkar,et al. Distributed Reinforcement Learning via Gossip , 2013, IEEE Transactions on Automatic Control.

[15] Ali H. Sayed,et al. Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.

[16] Carlo Fischione,et al. A distributed approach for the optimal power flow problem , 2014, 2016 European Control Conference (ECC).

[17] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[18] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[19] M. Pipattanasomporn,et al. Multi-agent systems in a distributed smart grid: Design and implementation , 2009, 2009 IEEE/PES Power Systems Conference and Exposition.

[20] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[21] H. Vincent Poor,et al. QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..

[22] Saeid Nahavandi,et al. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[23] Sahin Albayrak,et al. An agent-based approach for privacy-preserving recommender systems , 2007, AAMAS '07.

[24] Thinh T. Doan,et al. Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning , 2019, ICML.

[25] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[26] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[27] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[28] Tamer Basar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[29] Asuman E. Ozdaglar,et al. On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[30] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[31] Yan Zhang,et al. Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[32] Tamer Basar,et al. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[33] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[34] Qichao Zhang,et al. Reinforcement Learning and Deep Learning based Lateral Control for Autonomous Driving , 2018, IEEE Comput. Intell. Mag..