Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications

Reinforcement learning (RL) algorithms have been around for decades and employed to solve various sequential decision-making problems. These algorithms, however, have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This article addresses an important aspect of deep RL related to situations that require multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multiagent deep RL (MADRL) is presented, including nonstationarity, partial observability, continuous state and action spaces, multiagent training schemes, and multiagent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to the future development of more robust and highly useful multiagent learning methods for solving real-world problems.

[1]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[2]  Majid Nili Ahmadabadi,et al.  Knowledge-Based Multiagent Credit Assignment: A Study on Task Type and Critic Information , 2007, IEEE Systems Journal.

[3]  Fu Jiang,et al.  A Distributed Q Learning Spectrum Decision Scheme for Cognitive Radio Sensor Network , 2015, Int. J. Distributed Sens. Networks.

[4]  Wulfram Gerstner,et al.  Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation , 2018, ICML.

[5]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[6]  Yizhou Wang,et al.  Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning , 2017, ArXiv.

[7]  Saeid Nahavandi,et al.  Trusted Autonomy Between Humans and Robots: Toward Human-on-the-Loop in Robotics and Autonomous Systems , 2017, IEEE Systems, Man, and Cybernetics Magazine.

[8]  Alessandro Lazaric,et al.  Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.

[9]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[10]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[11]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[12]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[13]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.

[14]  Frans A. Oliehoek,et al.  Decentralized POMDPs , 2012, Reinforcement Learning.

[15]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[16]  Kun Zhang,et al.  Event-Triggered Adaptive Dynamic Programming for Non-Zero-Sum Games of Unknown Nonlinear Systems via Generalized Fuzzy Hyperbolic Models , 2019, IEEE Transactions on Fuzzy Systems.

[17]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[18]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[19]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[20]  Philippe Mathieu,et al.  On the Design of Agent-Based Artificial Stock Markets , 2011, ICAART.

[21]  Saeid Nahavandi,et al.  A Human Mixed Strategy Approach to Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[22]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[23]  Bruno Scherrer,et al.  On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.

[24]  Saeid Nahavandi,et al.  System Design Perspective for Human-Level Agents Using Deep Reinforcement Learning: A Survey , 2017, IEEE Access.

[25]  Parag C. Pendharkar,et al.  Trading financial indices with reinforcement learning agents , 2018, Expert Syst. Appl..

[26]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[27]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[28]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[29]  Saeid Nahavandi,et al.  Multi-Agent Deep Reinforcement Learning with Human Strategies , 2018, 2019 IEEE International Conference on Industrial Technology (ICIT).

[30]  Karl Tuyls,et al.  Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[31]  Sherief Abdallah,et al.  Addressing the policy-bias of q-learning by repeating updates , 2013, AAMAS.

[32]  Gerhard Neumann,et al.  Guided Deep Reinforcement Learning for Swarm Systems , 2017, ArXiv.

[33]  Pascal Poupart,et al.  On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.

[34]  Thanh Thi Nguyen,et al.  A Multi-Objective Deep Reinforcement Learning Framework , 2018, Eng. Appl. Artif. Intell..

[35]  Huai-Ning Wu,et al.  Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control , 2017, IEEE Transactions on Cybernetics.

[36]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[37]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[39]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[40]  Joshua B. Tenenbaum,et al.  Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.

[41]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[42]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[43]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[44]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[45]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[46]  Tao Feng,et al.  Distributed Optimal Consensus Control for Nonlinear Multiagent System With Unknown Dynamic , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Pablo Hernandez-Leal,et al.  A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[48]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[49]  Hemant Kumar Rath,et al.  Distributed optimization in multi-agent robotics for industry 4.0 warehouses , 2018, SAC.

[50]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[51]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[52]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[53]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[54]  Frank L. Lewis,et al.  Cooperative optimal output regulation of multi-agent systems using adaptive dynamic programming , 2017, 2017 American Control Conference (ACC).

[55]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[56]  Matthew E. Taylor,et al.  Autonomously Reusing Knowledge in Multiagent Reinforcement Learning , 2018, IJCAI.

[57]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[58]  Qionghai Dai,et al.  Cooperative Deep Reinforcement Learning for Large-Scale Traffic Grid Signal Control , 2020, IEEE Transactions on Cybernetics.

[59]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[60]  Hemanshu R. Pota,et al.  Distributed Multi-Agent-Based Protection Scheme for Transient Stability Enhancement in Power Systems , 2015 .

[61]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[62]  Chaomin Luo,et al.  Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms , 2017, IEEE Transactions on Cybernetics.

[63]  Wojciech Jaskowski,et al.  Heterogeneous team deep q-learning in low-dimensional multi-agent environments , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[64]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[65]  Yan Zheng,et al.  Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments , 2018, PRICAI.

[66]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[67]  Matthieu Geist,et al.  Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Zhang-Wei Hong,et al.  A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.

[69]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[70]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[71]  Alvaro Ovalle Deep Reinforcement Learning Variants of Multi-Agent Learning Algorithms , 2016 .

[72]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[73]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[74]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[75]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[76]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[77]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[78]  Minjie Zhang,et al.  Multiagent Learning of Coordination in Loosely Coupled Multiagent Systems , 2015, IEEE Transactions on Cybernetics.

[79]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[80]  E. Ostrom,et al.  Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.

[81]  Ivana Dusparic,et al.  Heterogeneous Multi-Agent Deep Reinforcement Learning for Traffic Lights Control , 2018, AICS.

[82]  Shimon Whiteson,et al.  Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[83]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[84]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[85]  Joelle Pineau,et al.  The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach , 2018, J. Artif. Intell. Res..

[86]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[87]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[88]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.

[89]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[90]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[91]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[92]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[93]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[94]  Maxim Egorov Stanford MULTI-AGENT DEEP REINFORCEMENT LEARNING , 2016 .

[95]  Sherief Abdallah,et al.  Addressing Environment Non-Stationarity by Repeating Q-learning Updates , 2016, J. Mach. Learn. Res..

[96]  Rahul Savani,et al.  Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[97]  Peter Corcoran,et al.  Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning , 2017, ArXiv.

[98]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[99]  Ivana Dusparic,et al.  Multi-agent Deep Reinforcement Learning for Zero Energy Communities , 2018, 2019 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe).

[100]  Manuel Graña,et al.  Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning , 2015, PloS one.

[101]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[102]  Giovanni Montana,et al.  Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations , 2018, ArXiv.

[103]  Dearborn Animal Intelligence: An Experimental Study of the Associative Processes in Animals , 1900 .

[104]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[105]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[106]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[107]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[108]  Yifeng Zhu,et al.  Zero Shot Transfer Learning for Robot Soccer , 2018, AAMAS.

[109]  Masnida Hussin,et al.  Improving reliability in resource management through adaptive reinforcement learning for distributed systems , 2015, J. Parallel Distributed Comput..

[110]  Sinno Jialin Pan,et al.  Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.

[111]  Fangchen Liu,et al.  Effective Master-Slave Communication On A Multi-Agent Deep Reinforcement Learning System , 2017 .

[112]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[113]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[114]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[115]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[116]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[117]  David Luviano Cruz,et al.  Multi-Agent Reinforcement Learning Using Linear Fuzzy Model Applied to Cooperative Mobile Robots , 2018, Symmetry.

[118]  Geoffrey A. Hollinger,et al.  Search and pursuit-evasion in mobile robotics , 2011, Auton. Robots.

[119]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[120]  Lenz Belzner,et al.  Action Markets in Deep Multi-Agent Reinforcement Learning , 2018, ICANN.

[121]  Dilek Z. Hakkani-Tür,et al.  Federated Control with Hierarchical Multi-Agent Deep Reinforcement Learning , 2017, ArXiv.

[122]  Michael L. Littman,et al.  Cyclic Equilibria in Markov Games , 2005, NIPS.

[123]  Toshiharu Sugawara,et al.  Learning to Coordinate with Deep Reinforcement Learning in Doubles Pong Game , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[124]  Derong Liu,et al.  Adaptive $Q$ -Learning for Data-Based Optimal Output Regulation With Experience Replay , 2018, IEEE Transactions on Cybernetics.

[125]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[126]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[127]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[128]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[129]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[130]  Derong Liu,et al.  Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[131]  Samir Ben Ahmed,et al.  Multi-agent Deep Reinforcement Learning for Task Allocation in Dynamic Environment , 2017, ICSOFT.

[132]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[133]  Minjie Zhang,et al.  Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[134]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.