论文信息 - TD based reinforcement learning using neural networks in control problems with continuous action space

TD based reinforcement learning using neural networks in control problems with continuous action space

While most of the research on reinforcement learning assumed a discrete control space, many of the real world control problems need to have continuous output. This can be achieved by using continuous mapping functions for the value and action functions of the reinforcement learning architecture. Two questions arise here however. One is what sort of function representation to use and the other is how to determine the amount of noise for search in action space. The ubiquitous back-propagation neural network is used here to learn the value and action functions. Next, the reinforcement predictor that is intended to predict the next reinforcement is introduced that also determines the amount of noise to add to the controller output. This proposed reinforcement learning architecture is found to have a sound online learning control performance through a computer simulation of the ball and beam system as an example plant.

Se-Young Oh | Jeong-Hoon Lee | Doo-Hyun Choi

[1] Yuhong Jiang,et al. Application of neural networks for real time control of a ball-beam system , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[2] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4] Ishwar K. Sethi,et al. Road-following with continuous learning , 1995, Proceedings of the Intelligent Vehicles '95. Symposium.

[5] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[6] Hyung Suck Cho,et al. A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning , 1995, IEEE Trans. Syst. Man Cybern..

[7] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..