论文信息 - Reinforcement learning model, algorithms and its application

Reinforcement learning model, algorithms and its application

Reinforcement learning comes from the animal learning theory. RL does not need prior knowledge, it can autonomously get optional policy with the knowledge obtained by trial-and-error and continuously interacting with dynamic environment. Its characteristics of self improving and online learning make reinforcement learning become one of intelligent agent's core technologies. In this paper, we firstly survey the model and theory of reinforcement learning. Then, we roundly present the main reinforcement learning algorithms, including Sarsa, temporal difference, Q-learning and function approximation. Finally, we briefly introduce some applications of reinforcement learning and point out some future research directions of reinforcement learning.

Wang Qiang | Zhan Zhongli | Wang Qiang | Zhang Zhongli | Zhongli Zhan

[1] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[2] Emdad Khan. Reinforcement and unsupervised learning in fuzzy-neuro controllers , 1992, Defense, Security, and Sensing.

[3] T. Ishida,et al. Learning control of an inverted pendulum using a neural network , 1991, Proceedings IECON '91: 1991 International Conference on Industrial Electronics, Control and Instrumentation.

[4] Nicholas R. Jennings,et al. Using Archon to Develop Real-World DAI Applications, Part 1 , 1996, IEEE Expert.

[5] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[6] M. A. Griffin,et al. Information Processing Systems , 1976 .

[7] Hamid R. Berenji,et al. Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[8] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[9] L. Darrell Whitley,et al. Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.

[10] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[11] Atsushi Nakayama,et al. Learning to Control an Inverted Pendulum Using Neural Network without Teaching Signals. , 1993 .

[12] Masumi Ishikawa,et al. Determination of the optimal values of parameters in reinforcement learning for mobile robot navigation by a genetic algorithm , 2004 .

[13] Nicholas R. Jennings,et al. Using ARCHON to Develop Real-World DAI Applications for Electricity Transportation and Particle Acce , 1995 .

[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15] Nicholas R. Jennings,et al. Using ARCHONTM to develop real-world DAI applications for electricity transportation management and particle accelerator control , 2007 .