A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

In this work, we propose a new fuzzy reinforcement learning algorithm for differential games that have continuous state and action spaces. The proposed algorithm uses function approximation systems whose parameters are updated differently from the updating mechanisms used in the algorithms proposed in the literature. Unlike the algorithms presented in the literature which use the direct algorithms to update the parameters of their function approximation systems, the proposed algorithm uses the residual gradient value iteration algorithm to tune the input and output parameters of its function approximation systems. It has been shown in the literature that the direct algorithms may not converge to an answer in some cases, while the residual gradient algorithms are always guaranteed to converge to a local minimum. The proposed algorithm is called the residual gradient fuzzy actor–critic learning (RGFACL) algorithm. The proposed algorithm is used to learn three different pursuit–evasion differential games. Simulation results show that the performance of the proposed RGFACL algorithm outperforms the performance of the fuzzy actor–critic learning and the Q-learning fuzzy inference system algorithms in terms of convergence and speed of learning.

[1]  E. H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Man Mach. Stud..

[2]  Li-Xin Wang,et al.  A Course In Fuzzy Systems and Control , 1996 .

[3]  S. Micera,et al.  Adaptive fuzzy control of electrically stimulated muscles for arm movements , 1999, Medical & Biological Engineering & Computing.

[4]  Huaguang Zhang,et al.  Optimal tracking control for completely unknown nonlinear discrete-time Markov jump systems using data-based reinforcement learning method , 2016, Neurocomputing.

[5]  Howard M. Schwartz,et al.  A fuzzy reinforcement learning algorithm using a predictor for pursuit-evasion games , 2016, 2016 Annual IEEE Systems Conference (SysCon).

[6]  Howard M. Schwartz,et al.  Hybrid intelligent systems applied to the pursuit-evasion game , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  N. H. C. Yung,et al.  A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  John W. Sheppard,et al.  Colearning in Differential Games , 1998, Machine Learning.

[11]  Senén Barro,et al.  Autonomous and fast robot learning through motivation , 2007, Robotics Auton. Syst..

[12]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[13]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[14]  Howard M. Schwartz,et al.  The residual gradient FACL algorithm for differential games , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).

[15]  Senén Barro,et al.  Design of a fuzzy controller in mobile robotics using genetic algorithms , 2007, Appl. Soft Comput..

[16]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[17]  Howard M. Schwartz,et al.  A Decentralized Fuzzy Learning Algorithm for Pursuit-Evasion Differential Games with Superior Evaders , 2016, J. Intell. Robotic Syst..

[18]  Andrea Bonarini,et al.  Reinforcement distribution in fuzzy Q-learning , 2009, Fuzzy Sets Syst..

[19]  Howard M. Schwartz,et al.  Q(λ)‐learning adaptive fuzzy logic controllers for pursuit–evasion differential games , 2011 .

[20]  Terrence J. Sejnowski,et al.  TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[21]  Hugh F. Durrant-Whyte,et al.  A time-optimal control strategy for pursuit-evasion games problems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[22]  Robert Babuska,et al.  Adaptive fuzzy control of satellite attitude by reinforcement learning , 1998, IEEE Trans. Fuzzy Syst..

[23]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[24]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[25]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[26]  Daniel Sánchez,et al.  Fuzzy frameworks for mining data associations: fuzzy association rules and beyond , 2016, WIREs Data Mining Knowl. Discov..

[27]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[28]  Han-Xiong Li,et al.  Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Stephen Yurkovich,et al.  Fuzzy Control , 1997 .

[30]  Chi-Kwong Li,et al.  An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control , 2005, IEEE Transactions on Intelligent Transportation Systems.

[31]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[32]  Howard M. Schwartz,et al.  Self-learning fuzzy logic controllers for pursuit-evasion differential games , 2011, Robotics Auton. Syst..

[33]  F. Khaber,et al.  Adaptive fuzzy control of a class of SISO nonaffine nonlinear Systems , 2008 .

[34]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[35]  Han-Xiong Li,et al.  Data-based Suboptimal Neuro-control Design with Reinforcement Learning for Dissipative Spatially Distributed Processes , 2014 .

[36]  B. Silvano Zanutto,et al.  Learning Obstacle Avoidance with an Operant Behavior Model , 2004, Artificial Life.

[37]  Frank L. Lewis,et al.  Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[38]  Uzay Kaymak,et al.  Systems Control With Generalized Probabilistic Fuzzy-Reinforcement Learning , 2011, IEEE Transactions on Fuzzy Systems.

[39]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[40]  Huai-Ning Wu,et al.  Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear $H_{\infty}$ Control , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Dongbin Zhao,et al.  Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics , 2016 .

[42]  Dongbin Zhao,et al.  Online reinforcement learning control by Bayesian inference , 2016 .

[43]  Shaocheng Tong,et al.  Fuzzy Approximation-Based Adaptive Backstepping Optimal Control for a Class of Nonlinear Discrete-Time Systems With Dead-Zone , 2016, IEEE Transactions on Fuzzy Systems.

[44]  P. Varaiya,et al.  Differential games , 1971 .

[45]  Ferhat Daldaban,et al.  Phase inductance estimation for switched reluctance motor using adaptive neuro-fuzzy inference system , 2006 .

[46]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[47]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[48]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[49]  Toshiyuki Kondo,et al.  A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control , 2003, Robotics Auton. Syst..

[50]  M. Sugeno,et al.  Structure identification of fuzzy model , 1988 .

[51]  Warren E. Dixon,et al.  Model-based reinforcement learning for approximate optimal regulation , 2016, Autom..

[52]  Sidney Nascimento Givigi,et al.  A Reinforcement Learning Adaptive Fuzzy Controller for Differential Games , 2010, J. Intell. Robotic Syst..

[53]  H.K. Lam,et al.  Fuzzy controller with stability and performance rules for nonlinear systems , 2007, Fuzzy Sets Syst..

[54]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[55]  E. Mizutani,et al.  Neuro-Fuzzy and Soft Computing-A Computational Approach to Learning and Machine Intelligence [Book Review] , 1997, IEEE Transactions on Automatic Control.

[56]  P. Dayan,et al.  TD(λ) converges with probability 1 , 2004, Machine Learning.

[57]  Artur Merke,et al.  TD(0) Converges Provably Faster than the Residual Gradient Algorithm , 2003, ICML.

[58]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[59]  Xuesong Wang,et al.  A fuzzy Actor-Critic reinforcement learning network , 2007, Inf. Sci..

[60]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[61]  Hani Hagras,et al.  Learning and adaptation of an intelligent mobile robot navigator operating in unstructured environment based on a novel online Fuzzy-Genetic system , 2004, Fuzzy Sets Syst..

[62]  Howard M. Schwartz,et al.  Multi-Agent Machine Learning: A Reinforcement Approach , 2014 .