Reinforcement Distribution in Continuous State Action Space Fuzzy Q-Learning: A Novel Approach

Fuzzy Q-learning extends the Q-learning algorithm to work in presence of continuous state and action spaces. A Takagi-Sugeno Fuzzy Inference System (FIS) is used to infer the continuous executed action and its action-value, by means of cooperation of several rules. Different kinds of evolution of the parameters of the FIS are possible, depending on different strategies of distribution of the reinforcement signal. In this paper, we compare two strategies: the classical one, focusing on rewarding the rules that have proposed the actions composed to produce the actual action, and a new one we are introducing, where reward goes to the rules proposing actions closest the ones actually executed.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[3]  Kyung-Whan Oh,et al.  A fuzzy reinforcement function for the intelligent agent to process vague goals , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[4]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Hamid R. Berenji,et al.  A reinforcement learning--based architecture for fuzzy logic control , 1992, Int. J. Approx. Reason..

[7]  Andrea Bonarini,et al.  An approach to the design of reinforcement functions in real world, agent-based applications , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[8]  P. Glorennec,et al.  Fuzzy Q-learning , 1997, Proceedings of 6th International Fuzzy Systems Conference.