A fuzzy reinforcement learning algorithm using a predictor for pursuit-evasion games

In a pursuit-evasion game, the pursuer learning its strategy by any learning algorithm usually captures the evader when the environment of the game is similar to the environment that the pursuer was trained on. However, the trained pursuer may not be able to capture the evader if the environment of the pursuit-evasion game is different from the training environment. In this paper, we propose a fuzzy reinforcement learning algorithm so that the ability of the pursuer to capture the evader, in a pursuit-evasion game, will increase even when the environment of the game is different from the training environment. The proposed algorithm predicts the future position of the evader using a Kalman filter and then tunes the fuzzy logic controller (FLC) of the pursuer so that the pursuer moves directly to the expected position of the evader, where the capture of the evader will occur. The proposed algorithm is called the Kalman filter fuzzy actor critic learning (KFFACL) algorithm. The proposed KFFACL algorithm is applied to pursuitevasion games that have environments different from the training environment. Simulation results show that the proposed KFFACL algorithm outperforms the state-of-the-art fuzzy reinforcement learning algorithms in terms of the ability of the pursuer to capture the evader and the capture time.

[1]  Li-Xin Wang,et al.  A Course In Fuzzy Systems and Control , 1996 .

[2]  Ichiro Kobayashi,et al.  A study on the efficiency of learning a robot controller in various environments , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[3]  Sidney Nascimento Givigi,et al.  A Reinforcement Learning Adaptive Fuzzy Controller for Differential Games , 2010, J. Intell. Robotic Syst..

[4]  Howard M. Schwartz,et al.  Q(λ)‐learning adaptive fuzzy logic controllers for pursuit–evasion differential games , 2011 .

[5]  Rufus Isaacs,et al.  Differential Games , 1965 .

[6]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[7]  Hugh F. Durrant-Whyte,et al.  A time-optimal control strategy for pursuit-evasion games problems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[8]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[9]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[10]  Howard M. Schwartz,et al.  Self-learning fuzzy logic controllers for pursuit-evasion differential games , 2011, Robotics Auton. Syst..

[11]  M. Sugeno,et al.  Structure identification of fuzzy model , 1988 .

[12]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[13]  Dan Simon,et al.  Optimal State Estimation: Kalman, H∞, and Nonlinear Approaches , 2006 .

[14]  Howard M. Schwartz,et al.  Multi-Agent Machine Learning: A Reinforcement Approach , 2014 .

[15]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  Howard M. Schwartz,et al.  The residual gradient FACL algorithm for differential games , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).