Policy gradient fuzzy reinforcement learning

This work presents a new approach for tuning conclusions of fuzzy rules based on reinforcement learning. Unlike the most of existing fuzzy reinforcement learning algorithms, which are based on value function, while our approach called policy gradient fuzzy reinforcement learning (PGFRL) bases on gradient estimate. In PGFRL, the algorithm GPOMDP is employed to estimate the performance gradient with respect to the parameters of fuzzy rules. In our work we prove the convergence of fuzzy rules' parameters to a local optimum given necessary conditions. The experiment results show the effectiveness of PGFRL.

[1]  Chi-Kwong Li,et al.  An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control , 2005, IEEE Transactions on Intelligent Transportation Systems.

[2]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[3]  M. K. Ali,et al.  Fuzzy Reinforcement Learning , 2002, Fuzzy Logic Theory and Applications.

[4]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[5]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[6]  P. Y. Glorennec,et al.  Fuzzy Q-learning and dynamical fuzzy Q-learning , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.