A fuzzy deterministic policy gradient algorithm for pursuit-evasion differential games

Abstract Fuzzy inference systems with reinforcement learning are currently being used in differential games to train agents with no prior experience. However, the reinforcement learning algorithms based on actor-critic structure have a drawback that the policy is depended on a probability distribution. In this paper, a novel fuzzy deterministic policy gradient algorithm is introduced and applied to classical 1-vs-1 constant-velocity pursuit-evasion differential games. The key goal is to self-learn the optimal strategy in the continuous action domain and obtain a specific physical meaning of the fuzzy rules. The novel proposed algorithm is based on the deterministic policy gradient theorem and the agent learns the near-optimal strategy under the actor-critic structure. The fuzzy inference system is applied as approximators so that the specific physical meaning can be obtained by the linguistic fuzzy rules. Furthermore, the proposed algorithm is applied to solve the decision-making problem of pursuit-evasion differential games. The result is compared with other existing algorithms and it elucidates that the proposed algorithm outperforms the precision and convergence efficiency.

[1]  Hugh F. Durrant-Whyte,et al.  A time-optimal control strategy for pursuit-evasion games problems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[2]  Lincheng Shen,et al.  A Continuous-Time Markov Decision Process-Based Method With Application in a Pursuit-Evasion Example , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  Yukinori Kakazu,et al.  An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning , 2003, Robotics Auton. Syst..

[4]  Diatha Krishna Sundar,et al.  An actor-critic algorithm for multi-agent learning in queue-based stochastic games , 2014, Neurocomputing.

[5]  R Bellman,et al.  DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[8]  Y. Ho,et al.  Differential games and optimal pursuit-evasion strategies , 1965 .

[9]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[10]  George van Schoor,et al.  Adaptive Neural network control of a helicopter system with optimal observer and actor-critic design , 2018, Neurocomputing.

[11]  Qinglai Wei,et al.  Neural-network-based synchronous iteration learning method for multi-player zero-sum games , 2017, Neurocomputing.

[12]  Howard M. Schwartz,et al.  Self-learning fuzzy logic controllers for pursuit-evasion differential games , 2011, Robotics Auton. Syst..

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Jie Liu,et al.  A Pursuit-Evasion Algorithm Based on Hierarchical Reinforcement Learning , 2009, 2009 International Conference on Measuring Technology and Mechatronics Automation.

[15]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[16]  Wei Sun,et al.  Optimal Evading Strategies for Two-Pursuer/One-Evader Problems , 2018 .

[17]  Huaguang Zhang,et al.  Event-trigger-based robust control for nonlinear constrained-input systems using reinforcement learning method , 2019, Neurocomputing.

[18]  Erik-Jan Van Kampen,et al.  Incremental model based online dual heuristic programming for nonlinear adaptive control , 2018 .

[19]  Howard M. Schwartz,et al.  A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games , 2017, International Journal of Fuzzy Systems.

[20]  Zhan Li,et al.  Training a robust reinforcement learning controller for the uncertain system based on policy gradient method , 2018, Neurocomputing.

[21]  Kazuyuki Murase,et al.  Quaternion neuro-fuzzy learning algorithm for generation of fuzzy rules , 2016, Neurocomputing.

[22]  Ingo Althöfer,et al.  The problem of approach in differential–difference games , 2016, Int. J. Game Theory.

[23]  Robert F. Stengel,et al.  Online Adaptive Critic Flight Control , 2004 .

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[26]  Ye Zhou,et al.  Hybrid Hierarchical Reinforcement Learning for online guidance and navigation with partial observability , 2019, Neurocomputing.

[27]  Wei Sun,et al.  Pursuit-evasion games in dynamic flow fields via reachability set analysis , 2017, 2017 American Control Conference (ACC).

[28]  Yuzhu Huang,et al.  Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming , 2017, Neurocomputing.

[29]  Chi-Kwong Li,et al.  An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control , 2005, IEEE Transactions on Intelligent Transportation Systems.