Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

Reinforcement learning (RL) algorithms have been around for decades and employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that require multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.

[1]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[2]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[3]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[4]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[5]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[6]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[7]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  Michael L. Littman,et al.  Cyclic Equilibria in Markov Games , 2005, NIPS.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Alessandro Lazaric,et al.  Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.

[12]  Majid Nili Ahmadabadi,et al.  Knowledge-Based Multiagent Credit Assignment: A Study on Task Type and Critic Information , 2007, IEEE Systems Journal.

[13]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[14]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[17]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[18]  E. Ostrom,et al.  Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.

[19]  Philippe Mathieu,et al.  On the Design of Agent-Based Artificial Stock Markets , 2011, ICAART.

[20]  Geoffrey A. Hollinger,et al.  Search and pursuit-evasion in mobile robotics , 2011, Auton. Robots.

[21]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[22]  Frans A. Oliehoek,et al.  Decentralized POMDPs , 2012, Reinforcement Learning.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Sherief Abdallah,et al.  Addressing the policy-bias of q-learning by repeating updates , 2013, AAMAS.

[25]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[26]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[27]  Masnida Hussin,et al.  Improving reliability in resource management through adaptive reinforcement learning for distributed systems , 2015, J. Parallel Distributed Comput..

[28]  Fu Jiang,et al.  A Distributed Q Learning Spectrum Decision Scheme for Cognitive Radio Sensor Network , 2015, Int. J. Distributed Sens. Networks.

[29]  Manuel Graña,et al.  Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning , 2015, PloS one.

[30]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[31]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[32]  Hemanshu R. Pota,et al.  Distributed Multi-Agent-Based Protection Scheme for Transient Stability Enhancement in Power Systems , 2015 .

[33]  Minjie Zhang,et al.  Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Minjie Zhang,et al.  Multiagent Learning of Coordination in Loosely Coupled Multiagent Systems , 2015, IEEE Transactions on Cybernetics.

[35]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[36]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[37]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[38]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[39]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[40]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[41]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[42]  Joshua B. Tenenbaum,et al.  Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.

[43]  Maxim Egorov Stanford MULTI-AGENT DEEP REINFORCEMENT LEARNING , 2016 .

[44]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[45]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[46]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[47]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[48]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[49]  Bruno Scherrer,et al.  On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.

[50]  Shimon Whiteson,et al.  Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[51]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[52]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[53]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[54]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[55]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[56]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[57]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[58]  Wojciech Jaskowski,et al.  Heterogeneous team deep q-learning in low-dimensional multi-agent environments , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[59]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[60]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[61]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[62]  Alvaro Ovalle Deep Reinforcement Learning Variants of Multi-Agent Learning Algorithms , 2016 .

[63]  Sherief Abdallah,et al.  Addressing Environment Non-Stationarity by Repeating Q-learning Updates , 2016, J. Mach. Learn. Res..

[64]  Samir Ben Ahmed,et al.  Multi-agent Deep Reinforcement Learning for Task Allocation in Dynamic Environment , 2017, ICSOFT.

[65]  Pablo Hernandez-Leal,et al.  A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[66]  Toshiharu Sugawara,et al.  Learning to Coordinate with Deep Reinforcement Learning in Doubles Pong Game , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[67]  Sinno Jialin Pan,et al.  Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.

[68]  Saeid Nahavandi,et al.  Trusted Autonomy Between Humans and Robots: Toward Human-on-the-Loop in Robotics and Autonomous Systems , 2017, IEEE Systems, Man, and Cybernetics Magazine.

[69]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[70]  Dilek Z. Hakkani-Tür,et al.  Federated Control with Hierarchical Multi-Agent Deep Reinforcement Learning , 2017, ArXiv.

[71]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[72]  Peter Corcoran,et al.  Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning , 2017, ArXiv.

[73]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[74]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[75]  Saeid Nahavandi,et al.  System Design Perspective for Human-Level Agents Using Deep Reinforcement Learning: A Survey , 2017, IEEE Access.

[76]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[77]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[78]  Fangchen Liu,et al.  Effective Master-Slave Communication On A Multi-Agent Deep Reinforcement Learning System , 2017 .

[79]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[80]  Pascal Poupart,et al.  On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.

[81]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[82]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[83]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[84]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[85]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[86]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.

[87]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[88]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[89]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[90]  Gerhard Neumann,et al.  Guided Deep Reinforcement Learning for Swarm Systems , 2017, ArXiv.

[91]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[92]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[93]  Matthieu Geist,et al.  Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[94]  Yizhou Wang,et al.  Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning , 2017, ArXiv.

[95]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[96]  Giovanni Montana,et al.  Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations , 2018, ArXiv.

[97]  David Luviano Cruz,et al.  Multi-Agent Reinforcement Learning Using Linear Fuzzy Model Applied to Cooperative Mobile Robots , 2018, Symmetry.

[98]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[99]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.

[100]  Rahul Savani,et al.  Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[101]  Wulfram Gerstner,et al.  Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation , 2018, ICML.

[102]  Lenz Belzner,et al.  Action Markets in Deep Multi-Agent Reinforcement Learning , 2018, ICANN.

[103]  Zhang-Wei Hong,et al.  A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.

[104]  Hemant Kumar Rath,et al.  Distributed optimization in multi-agent robotics for industry 4.0 warehouses , 2018, SAC.

[105]  Matthew E. Taylor,et al.  Autonomously Reusing Knowledge in Multiagent Reinforcement Learning , 2018, IJCAI.

[106]  Yifeng Zhu,et al.  Zero Shot Transfer Learning for Robot Soccer , 2018, AAMAS.

[107]  Ivana Dusparic,et al.  Heterogeneous Multi-Agent Deep Reinforcement Learning for Traffic Lights Control , 2018, AICS.

[108]  Parag C. Pendharkar,et al.  Trading financial indices with reinforcement learning agents , 2018, Expert Syst. Appl..

[109]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[110]  Yan Zheng,et al.  Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments , 2018, PRICAI.

[111]  Saeid Nahavandi,et al.  A Human Mixed Strategy Approach to Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[112]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[113]  Saeid Nahavandi,et al.  Multi-Agent Deep Reinforcement Learning with Human Strategies , 2018, 2019 IEEE International Conference on Industrial Technology (ICIT).

[114]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[115]  Ivana Dusparic,et al.  Multi-agent Deep Reinforcement Learning for Zero Energy Communities , 2018, 2019 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe).

[116]  Joelle Pineau,et al.  The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach , 2018, J. Artif. Intell. Res..

[117]  Thanh Thi Nguyen,et al.  A Multi-Objective Deep Reinforcement Learning Framework , 2018, Eng. Appl. Artif. Intell..