Multiagent Deep Reinforcement Learning: Challenges and Directions Towards Human-Like Approaches

This paper surveys the field of multiagent deep reinforcement learning. The combination of deep neural networks with reinforcement learning has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on multiple players’ joint actions and (b) the computational complexity of functions increases. We present the most common multiagent problem representations and their main challenges, and identify five research areas that address one or more of these challenges: centralised training and decentralised execution, opponent modelling, communication, efficient coordination, and reward shaping. We find that many computational studies rely on unrealistic assumptions or are not generalisable to other settings; they struggle to overcome the curse of dimensionality or nonstationarity. Approaches from psychology and sociology capture promising relevant behaviours such as communication and coordination. We suggest that, for multiagent reinforcement learning to be successful, future research addresses these challenges with an interdisciplinary approach to open up new possibilities for more human-oriented solutions in multiagent reinforcement learning.

[1]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[2]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[3]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[4]  D. Kahneman,et al.  Heuristics and Biases: The Psychology of Intuitive Judgment , 2002 .

[5]  Yi Wu,et al.  Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[6]  Stephen Clark,et al.  Emergent Communication through Negotiation , 2018, ICLR.

[7]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[8]  Sonia Chernova,et al.  Learning from Demonstration for Shaping through Inverse Reinforcement Learning , 2016, AAMAS.

[9]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[10]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11]  Tiejian Luo,et al.  Learning to Communicate via Supervised Attentional Message Processing , 2018, CASA.

[12]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[13]  Kagan Tumer,et al.  Modeling difference rewards for multiagent learning , 2012, AAMAS.

[14]  Kenneth O. Stanley,et al.  ES is more than just a traditional finite-difference approximator , 2017, GECCO.

[15]  Shlomo Zilberstein,et al.  Dynamic Programming Approximations for Partially Observable Stochastic Games , 2009, FLAIRS.

[16]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[17]  Michael H. Bowling,et al.  Evaluating state-space abstractions in extensive-form games , 2013, AAMAS.

[18]  Masayoshi Tomizuka,et al.  Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[20]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[21]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[22]  Stephen J. Roberts,et al.  Learning Against Non-Stationary Agents with Opponent Modelling and Deep Reinforcement Learning , 2018, AAAI Spring Symposia.

[23]  Emil Gustavsson,et al.  Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence , 2016, ArXiv.

[24]  Jorge Gomes,et al.  Dynamic Team Heterogeneity in Cooperative Coevolutionary Algorithms , 2018, IEEE Transactions on Evolutionary Computation.

[25]  Kagan Tumer,et al.  Multi-objective Multiagent Credit Assignment Through Difference Rewards in Reinforcement Learning , 2014, SEAL.

[26]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[27]  Kevin R. McKee,et al.  Neural Recursive Belief States in Multi-Agent Reinforcement Learning , 2021, ArXiv.

[28]  Yan Zheng,et al.  Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments , 2018, PRICAI.

[29]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[30]  Toshiharu Sugawara,et al.  Learning to Coordinate with Deep Reinforcement Learning in Doubles Pong Game , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[31]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[32]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[33]  Yuk Ying Chung,et al.  Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning , 2020, NeurIPS.

[34]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[35]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[36]  Praveen Palanisamy,et al.  Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[37]  J. Dovidio,et al.  Helping behavior and altruism: an empirical and conceptual overview , 1984 .

[38]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[39]  Matthew E. Taylor,et al.  A Bayesian Approach for Learning and Tracking Switching, Non-Stationary Opponents: (Extended Abstract) , 2016, AAMAS.

[40]  A. Colman Cooperation, psychological game theory, and limitations of rationality in social interaction , 2003, Behavioral and Brain Sciences.

[41]  Yaodong Yang,et al.  An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective , 2020, ArXiv.

[42]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[43]  Wenlong Fu,et al.  Model-based reinforcement learning: A survey , 2018 .

[44]  Martijn C. Schut,et al.  Evolving team behaviors with specialization , 2012, Genetic Programming and Evolvable Machines.

[45]  Youngchul Sung,et al.  Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning , 2019, AAAI.

[46]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[47]  Wenhang Bao,et al.  Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis , 2019, ArXiv.

[48]  G. Tesauro,et al.  Learning Hierarchical Teaching Policies for Cooperative Agents , 2019, AAMAS.

[49]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[50]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[51]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[52]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[53]  Peter Stone,et al.  Autonomous agents modelling other agents: A comprehensive survey and open problems , 2017, Artif. Intell..

[54]  Athanasios S. Polydoros,et al.  Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..

[55]  Shimon Whiteson,et al.  MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[56]  Kenneth O. Stanley,et al.  On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent , 2017, ArXiv.

[57]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[58]  Hongyi Zhou,et al.  MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[59]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[60]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[61]  A. Goldman To Appear in: , 2008 .

[62]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[63]  Jonathan P. How,et al.  Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.

[64]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[65]  Pablo Hernandez-Leal,et al.  A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[66]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[67]  Yujing Hu,et al.  Q-value Path Decomposition for Deep Multiagent Reinforcement Learning , 2020, ICML.

[68]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[69]  Tom Eccles,et al.  Learning Reciprocity in Complex Sequential Social Dilemmas , 2019, ArXiv.

[70]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[71]  Kian Hsiang Low,et al.  R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games , 2020, ICML.

[72]  David P. Landau,et al.  Phase transitions and critical phenomena , 1989, Computing in Science & Engineering.

[73]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[74]  Filippos Christianos,et al.  Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[75]  Diego Perez Liebana,et al.  Teaching on a Budget in Multi-Agent Deep Reinforcement Learning , 2019, 2019 IEEE Conference on Games (CoG).

[76]  Joelle Pineau,et al.  On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[77]  Jonathan P. How,et al.  R-MADDPG for Partially Observable Environments and Limited Communication , 2019, ArXiv.

[78]  Felipe Leno da Silva,et al.  A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[79]  Sam Devlin,et al.  Difference Rewards Policy Gradients , 2020, AAMAS.

[80]  Ladislau Bölöni,et al.  Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[81]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[82]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[83]  J. K. Terry,et al.  Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning , 2020 .

[84]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[85]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[86]  Alexander Nareyek,et al.  Choosing search heuristics by non-stationary reinforcement learning , 2004 .

[87]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[88]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[89]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[90]  Joel Z. Leibo,et al.  Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning , 2020, AAMAS.

[91]  SchwefelHans-Paul,et al.  An overview of evolutionary algorithms for parameter optimization , 1993 .

[92]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[93]  Julian Togelius,et al.  AlphaStar: an evolutionary computation perspective , 2019, GECCO.

[94]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[95]  Jun Wang,et al.  Multi-Agent Reinforcement Learning , 2020, Deep Reinforcement Learning.

[96]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[97]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[98]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[99]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[100]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[101]  Frans A. Oliehoek,et al.  Scalable Planning and Learning for Multiagent POMDPs , 2014, AAAI.

[102]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[103]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[104]  Angelo Cangelosi,et al.  Hierarchical reinforcement learning as creative problem solving , 2016, Robotics Auton. Syst..

[105]  Tianshu Chu,et al.  Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[106]  John J. Grefenstette,et al.  Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[107]  Thomas Bäck,et al.  An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.

[108]  Matthew Hausknecht and Peter Stone,et al.  Grounded Semantic Networks for Learning Shared Communication Protocols , 2016 .

[109]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[110]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[111]  Kenneth O. Stanley,et al.  Safe mutations for deep and recurrent neural networks through output gradients , 2017, GECCO.

[112]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[113]  Madalina M. Drugan,et al.  Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms , 2019, Swarm Evol. Comput..

[114]  Shimon Whiteson,et al.  Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[115]  E. Fehr A Theory of Fairness, Competition and Cooperation , 1998 .

[116]  Felipe Leno da Silva,et al.  Simultaneously Learning and Advising in Multiagent Reinforcement Learning , 2017, AAMAS.

[117]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[118]  Xiangyu Liu,et al.  ACCNet: Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning , 2017, ArXiv.

[119]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[120]  Hoong Chuin Lau,et al.  Credit Assignment For Collective Multiagent RL With Global Rewards , 2018, NeurIPS.

[121]  Wojciech Jaskowski,et al.  Heterogeneous team deep q-learning in low-dimensional multi-agent environments , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[122]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[123]  Kenneth O. Stanley,et al.  Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[124]  Shauharda Khadka,et al.  Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination , 2019, ICML.

[125]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[126]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[127]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[128]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[129]  Shimon Whiteson,et al.  Weighted QMIX: Expanding Monotonic Value Function Factorisation , 2020, NeurIPS.

[130]  Sam Devlin,et al.  Dynamic potential-based reward shaping , 2012, AAMAS.

[131]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[132]  Anders Lyhne Christensen,et al.  Avoiding convergence in cooperative coevolution with novelty search , 2014, AAMAS.

[133]  Weinan Zhang,et al.  Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[134]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[135]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[136]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[137]  Hangyu Mao,et al.  Learning multi-agent communication with double attentional deep reinforcement learning , 2020, Autonomous Agents and Multi-Agent Systems.

[138]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[139]  G Gigerenzer,et al.  Reasoning the fast and frugal way: models of bounded rationality. , 1996, Psychological review.

[140]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[141]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[142]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[143]  R. Bellman A Markovian Decision Process , 1957 .

[144]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[145]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[146]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[147]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[148]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[149]  D. Premack,et al.  Does the chimpanzee have a theory of mind? , 1978, Behavioral and Brain Sciences.

[150]  Huaimin Wang,et al.  Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems , 2019, Entropy.

[151]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[152]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[153]  Lei Han,et al.  LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, NeurIPS.

[154]  Klaus Diepold,et al.  Multi-agent deep reinforcement learning: a survey , 2021, Artificial Intelligence Review.

[155]  Gerd Gigerenzer,et al.  Good Judgments Do Not Require Complex Cognition , 2008 .

[156]  Yan Zheng,et al.  A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents , 2018, NeurIPS.

[157]  Jianye Hao,et al.  Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach , 2018, ArXiv.

[158]  Zhang-Wei Hong,et al.  A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.