论文信息 - A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

In this work, we propose a new fuzzy reinforcement learning algorithm for differential games that have continuous state and action spaces. The proposed algorithm uses function approximation systems whose parameters are updated differently from the updating mechanisms used in the algorithms proposed in the literature. Unlike the algorithms presented in the literature which use the direct algorithms to update the parameters of their function approximation systems, the proposed algorithm uses the residual gradient value iteration algorithm to tune the input and output parameters of its function approximation systems. It has been shown in the literature that the direct algorithms may not converge to an answer in some cases, while the residual gradient algorithms are always guaranteed to converge to a local minimum. The proposed algorithm is called the residual gradient fuzzy actor–critic learning (RGFACL) algorithm. The proposed algorithm is used to learn three different pursuit–evasion differential games. Simulation results show that the performance of the proposed RGFACL algorithm outperforms the performance of the fuzzy actor–critic learning and the Q-learning fuzzy inference system algorithms in terms of convergence and speed of learning.

Howard M. Schwartz | Mostafa D. Awheda | H. Schwartz

[1] E. H. Mamdani,et al. An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Man Mach. Stud..

[2] Li-Xin Wang,et al. A Course In Fuzzy Systems and Control , 1996 .

[3] S. Micera,et al. Adaptive fuzzy control of electrically stimulated muscles for arm movements , 1999, Medical & Biological Engineering & Computing.

[4] Huaguang Zhang,et al. Optimal tracking control for completely unknown nonlinear discrete-time Markov jump systems using data-based reinforcement learning method , 2016, Neurocomputing.

[5] Howard M. Schwartz,et al. A fuzzy reinforcement learning algorithm using a predictor for pursuit-evasion games , 2016, 2016 Annual IEEE Systems Conference (SysCon).

[6] Howard M. Schwartz,et al. Hybrid intelligent systems applied to the pursuit-evasion game , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[7] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8] N. H. C. Yung,et al. A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[9] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10] John W. Sheppard,et al. Colearning in Differential Games , 1998, Machine Learning.

[11] Senén Barro,et al. Autonomous and fast robot learning through motivation , 2007, Robotics Auton. Syst..

[12] Tingwen Huang,et al. Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[13] Lionel Jouffe,et al. Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[14] Howard M. Schwartz,et al. The residual gradient FACL algorithm for differential games , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).

[15] Senén Barro,et al. Design of a fuzzy controller in mobile robotics using genetic algorithms , 2007, Appl. Soft Comput..

[16] Frank L. Lewis,et al. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[17] Howard M. Schwartz,et al. A Decentralized Fuzzy Learning Algorithm for Pursuit-Evasion Differential Games with Superior Evaders , 2016, J. Intell. Robotic Syst..

[18] Andrea Bonarini,et al. Reinforcement distribution in fuzzy Q-learning , 2009, Fuzzy Sets Syst..

[19] Howard M. Schwartz,et al. Q(λ)‐learning adaptive fuzzy logic controllers for pursuit–evasion differential games , 2011 .

[20] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[21] Hugh F. Durrant-Whyte,et al. A time-optimal control strategy for pursuit-evasion games problems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[22] Robert Babuska,et al. Adaptive fuzzy control of satellite attitude by reinforcement learning , 1998, IEEE Trans. Fuzzy Syst..

[23] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[24] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[25] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[26] Daniel Sánchez,et al. Fuzzy frameworks for mining data associations: fuzzy association rules and beyond , 2016, WIREs Data Mining Knowl. Discov..

[27] Leslie Pack Kaelbling,et al. Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[28] Han-Xiong Li,et al. Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29] Stephen Yurkovich,et al. Fuzzy Control , 1997 .

[30] Chi-Kwong Li,et al. An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control , 2005, IEEE Transactions on Intelligent Transportation Systems.

[31] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[32] Howard M. Schwartz,et al. Self-learning fuzzy logic controllers for pursuit-evasion differential games , 2011, Robotics Auton. Syst..

[33] F. Khaber,et al. Adaptive fuzzy control of a class of SISO nonaffine nonlinear Systems , 2008 .

[34] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[35] Han-Xiong Li,et al. Data-based Suboptimal Neuro-control Design with Reinforcement Learning for Dissipative Spatially Distributed Processes , 2014 .

[36] B. Silvano Zanutto,et al. Learning Obstacle Avoidance with an Operant Behavior Model , 2004, Artificial Life.

[37] Frank L. Lewis,et al. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[38] Uzay Kaymak,et al. Systems Control With Generalized Probabilistic Fuzzy-Reinforcement Learning , 2011, IEEE Transactions on Fuzzy Systems.

[39] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.