Temporal Difference based Tuning of Fuzzy Logic Controller through Reinforcement Learning to Control an Inverted Pendulum

This paper presents a self-tuning method of fuzzy logic controllers. The consequence part of the fuzzy logic controller is self-tuned through the Q- learning algorithm of reinforcement learning. The off policy temporal difference algorithm is used for tuning which directly approximate the action value function which gives the maximum reward. In this way, the Q- learning algorithm is used for the continuous time environment. The approach considered is having the advantage of fuzzy logic controller in a way that it is robust under the environmental uncertainties and no expert knowledge is required to design the rule base of the fuzzy logic controller.

[1]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[2]  Uzay Kaymak,et al.  Systems Control With Generalized Probabilistic Fuzzy-Reinforcement Learning , 2011, IEEE Transactions on Fuzzy Systems.

[3]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[4]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  Szilveszter Kovács,et al.  Fuzzy Rule Interpolation-based Q-learning , 2009, 2009 5th International Symposium on Applied Computational Intelligence and Informatics.

[6]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[7]  J. Yi,et al.  Stabilization fuzzy control of inverted pendulum systems , 2000, Artif. Intell. Eng..

[8]  Chin-Teng Lin,et al.  A neural fuzzy control system with structure and parameter learning , 1995 .

[9]  Y. Kuroe,et al.  Swarm reinforcement learning algorithms based on Sarsa method , 2008, 2008 SICE Annual Conference.

[10]  Robert F. Harrison,et al.  Asymptotically optimal stabilising quadratic control of an inverted pendulum , 2003 .