Multi-agent behavioral control system using deep reinforcement learning

Abstract Deep reinforcement learning (DRL) has emerged as the dominant approach to achieving successive advancements in the creation of human-wise agents. By leveraging neural networks as decision-making controllers, DRL supplements traditional reinforcement methods to address the curse of dimensionality in complicated tasks. However, agents in complicated environments are likely to get stuck in sub-optimal solutions. In such cases, the agent inadvertently turns into a “zombie” owing to its short-term vision and harmful behaviors. In this study, we use human learning strategies to adjust agent behaviors in high-dimensional environments. Therefore, the agent behaves predictably and succeeds in attaining its designated goal. In summary, the contribution of this study is two-fold. First, we introduce a lightweight workflow that enables a nonexpert to preserve a certain level of safety in AI systems. Specifically, the workflow involves a novel concept of a target map and a multi-agent behavioral control system named Multi-Policy Control System (MPCS). MPCS successfully controls agent behaviors in real time without involving the burden of human feedback. Second, we develop a multi-agent game named Tank Battle that provides a configurable environment to examine agent behaviors and human-agent interactions in DRL. Finally, simulation results show that agents guided by MPCS outperform agents that do not use MPCS with respect to the mean of total rewards and human-like behaviors in complicated environments such as Seaquest and Tank Battle.

[1]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[2]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[3]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[4]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[5]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zhen Liu,et al.  Adaptive neural network tracking control-based reinforcement learning for wheeled mobile robots with skidding and slipping , 2017, Neurocomputing.

[7]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[8]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1991, Machine Learning.

[9]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[10]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11]  Saeid Nahavandi,et al.  System Design Perspective for Human-Level Agents Using Deep Reinforcement Learning: A Survey , 2017, IEEE Access.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[14]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[15]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[16]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[17]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[21]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[22]  Weidong Zhang,et al.  Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels , 2018, Neurocomputing.

[23]  Nick Bostrom,et al.  Superintelligence: Paths, Dangers, Strategies , 2014 .