Reinforcement Learning (RL) is thought to be an appropriate paradigm to acquire policies for autonomous learning agents that work without initial knowledge because RL evaluates learning from simple “evaluative” or “critic” information instead of “instructive” information used in Supervised Learning. There are two well-known types of RL, namely Actor-Critic Learning and Q-Leaning. Among them, Q-Learning (Watkins & Dayan, 1992) is the most widely used learning paradigm because of its simplicity and solid theoretical background. In Q-Learning, Q-vectors are used to evaluate the performance of appropriate actions which are selected by choosing the highest Q-value in the Q-vectors. Unfortunately, the conventional Q-Learning approach can only handle discrete states and actions. In the real-world, the learning agent needs to deal with continuous states and actions. For instance, in robotic applications, the robot needs to respond to dynamically changing environmental states with the smoothest action possible. Furthermore, the robot’s hardware can be damaged as a result of inappropriate discrete actions. In order to handle continuous states and actions, many researchers have enhanced the Qlearning methodology over the years. Continuous Action Q-Learning (Millan et al., 2002) is one of the Q-Learning methodologies which can handle continuous states and actions. Although this approach is better than the conventional Q-Learning technique, it is not as popular as the Fuzzy Q-Learning (FQL) (Jouffe, 1998) because the former is not based on solid theoretical background. Whereas CAQL considers neighboring actions of the highest Q-valued action in generating continuous actions, the FQL uses theoretically sound Fuzzy Inference System (FIS). On the contrary, the FQL approach is more favorable than the CAQL. Thus, our proposed approach is based on the FQL technique. The FIS identification can be carried out in two phases, namely structure identification phase and parameter identification phase. The structure identification phase defines how to generate fuzzy rules while the parameter identification phase determines premise parameters and consequent parts of the fuzzy rules. The FQL approach mainly focuses to handle parameter identification automatically while structure identification still remains an open issue in FQL. To circumvent the issue of structure identification, the Dynamic Fuzzy Q-Learning (DFQL) (Er & Deng, 2004) is proposed. The salient feature of the DFQL is that it can generate fuzzy rules according to the ε-completeness and Temporal Difference criteria O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg
[1]
Meng Joo Er,et al.
Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning
,
2004,
IEEE Trans. Syst. Man Cybern. Part B.
[2]
Bernd Fritzke,et al.
A Growing Neural Gas Network Learns Topologies
,
1994,
NIPS.
[3]
Shen Furao,et al.
An incremental network for on-line unsupervised classification and topology learning
,
2006,
Neural Networks.
[4]
Teuvo Kohonen,et al.
Self-organized formation of topologically correct feature maps
,
2004,
Biological Cybernetics.
[5]
Richard S. Sutton,et al.
Learning to predict by the methods of temporal differences
,
1988,
Machine Learning.
[6]
Meng Joo Er,et al.
A novel framework for automatic generation of fuzzy neural networks
,
2008,
Neurocomputing.
[7]
Meng Joo Er,et al.
Theory and Novel Applications of Machine Learning
,
2009
.
[8]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[9]
Thomas G. Dietterich.
What is machine learning?
,
2020,
Archives of Disease in Childhood.
[10]
Lionel Jouffe,et al.
Fuzzy inference system learning by reinforcement methods
,
1998,
IEEE Trans. Syst. Man Cybern. Part C.