Multi-agent Robust Time Differential Reinforcement Learning Over Communicated Networks

Recently, the researches on multi-agent reinforcement learning (MARL) have attracted tremendous interest in many applications, especially for autonomous driving. The main problem of MARL is how to deal with the uncertainty in the environment and the interaction between the connected agents. To solve the problem, a distributed robust temporal differential deep Q-network algorithm (MARTD-DQN) was developed in this paper. MARTD-DQN consists of two parts, the decentralized MARL algorithm (DMARL) and the robust TD deep Q-network algorithm (RTD-DQN). DMARL improves the robustness of the policy estimation by fusing the states from the neighbors over communicated networks. RTD- DQN improves the robustness to outliers through on-line estimation of the uncertainty. By combining the two algorithms, the proposed algorithm can be robust not only to node failures but also to the outliers. Then the proposed algorithm is applied to ACC simulations of autonomous cars. The simulation results are given to show the efficiency of the proposed algorithm.

[1]  Hamid Reza Karimi,et al.  Filtering of Discrete-Time Switched Neural Networks Ensuring Exponential Dissipative and $l_{2}$ – $l_{\infty }$ Performances , 2017, IEEE Transactions on Cybernetics.

[2]  Shie Mannor,et al.  Deep Robust Kalman Filter , 2017, ArXiv.

[3]  Judith Hylton SAFE: , 1993 .

[4]  John N. Tsitsiklis,et al.  Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[5]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[6]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[7]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[8]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Jianfeng Gao,et al.  Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear , 2016, ArXiv.

[11]  H. Vincent Poor,et al.  QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..

[12]  Junwei Gao,et al.  FMRQ—A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks , 2017, IEEE Transactions on Cybernetics.

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Hamid Reza Karimi,et al.  Fuzzy-Affine-Model-Based Memory Filter Design of Nonlinear Systems With Time-Varying Delay , 2018, IEEE Transactions on Fuzzy Systems.

[15]  Hamid Reza Karimi,et al.  Dissipativity-Based Small-Gain Theorems for Stochastic Network Systems , 2016, IEEE Transactions on Automatic Control.

[16]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[18]  Wenwu Yu,et al.  An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[19]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[20]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.