论文信息 - Actor-critic learning based on fuzzy inference system

Actor-critic learning based on fuzzy inference system

Actor-critic learning is a reinforcement learning method used to find an optimal agent behavior. The only information available for learning is the system feedback (reward/punishment). Initially, this method was analyzed for discrete states and actions. Functions were then approximated by lookup tables. Most of the real problems have large input spaces and/or continuous actions. So, other function approximators have to be used to introduce generalization. The actor-critic learning presented in this paper uses a fuzzy inference system (FIS) to generalize between states having the same fuzzy properties and between actions (continuous action case). The use of FIS rather than global function approximators like neural networks has two major advantages: the FIS inherent locality property permits the introduction of human knowledge, and it also localizes the learning process to only implicated parameters.

L. Jouffe | L. Jouffe

[1] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[2] Chuen-Chien Lee,et al. A self‐learning rule‐based controller employing approximate reasoning and neural net concepts , 1991, Int. J. Intell. Syst..

[3] Michio Sugeno,et al. Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5] Hamid R. Berenji,et al. Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[6] Leslie Pack Kaelbling,et al. Associative Reinforcement Learning: Functions in k-DNF , 1994, Machine Learning.

[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[9] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .