Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

Reinforcement Learning (RL) is a learning paradigm concerned with learning to control a system so as to maximize an objective over the long term. This approach to learning has received immense interest in recent times and success manifests itself in the form of human-level performance on games like \textit{Go}. While RL is emerging as a practical component in real-life systems, most successes have been in Single Agent domains. This report will instead specifically focus on challenges that are unique to Multi-Agent Systems interacting in mixed cooperative and competitive environments. The report concludes with advances in the paradigm of training Multi-Agent Systems called \textit{Decentralized Actor, Centralized Critic}, based on an extension of MDPs called \textit{Decentralized Partially Observable MDP}s, which has seen a renewed interest lately.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Tucker Balch,et al.  Learning Roles: Behavioral Diversity in Robot Teams , 1997 .

[3]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[5]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[6]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[9]  Andrew Zisserman,et al.  Kickstarting Deep Reinforcement Learning , 2018, ArXiv.

[10]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[11]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[12]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[13]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[16]  Claudio Gentile,et al.  Boltzmann Exploration Done Right , 2017, NIPS.

[17]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[18]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[19]  Laurent Jeanpierre,et al.  Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes , 2012, AAAI.

[20]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[21]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[22]  Alex A. Freitas,et al.  Evolutionary Computation , 2002 .

[23]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[24]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[27]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[28]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[29]  M. Pipattanasomporn,et al.  Multi-agent systems in a distributed smart grid: Design and implementation , 2009, 2009 IEEE/PES Power Systems Conference and Exposition.

[30]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[31]  Kagan Tumer,et al.  A multiagent approach to managing air traffic flow , 2010, Autonomous Agents and Multi-Agent Systems.

[32]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[33]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.