Reinforcement learning model, algorithms and its application

Reinforcement learning comes from the animal learning theory. RL does not need prior knowledge, it can autonomously get optional policy with the knowledge obtained by trial-and-error and continuously interacting with dynamic environment. Its characteristics of self improving and online learning make reinforcement learning become one of intelligent agent's core technologies. In this paper, we firstly survey the model and theory of reinforcement learning. Then, we roundly present the main reinforcement learning algorithms, including Sarsa, temporal difference, Q-learning and function approximation. Finally, we briefly introduce some applications of reinforcement learning and point out some future research directions of reinforcement learning.

[1]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[2]  Emdad Khan Reinforcement and unsupervised learning in fuzzy-neuro controllers , 1992, Defense, Security, and Sensing.

[3]  T. Ishida,et al.  Learning control of an inverted pendulum using a neural network , 1991, Proceedings IECON '91: 1991 International Conference on Industrial Electronics, Control and Instrumentation.

[4]  Nicholas R. Jennings,et al.  Using Archon to Develop Real-World DAI Applications, Part 1 , 1996, IEEE Expert.

[5]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[6]  M. A. Griffin,et al.  Information Processing Systems , 1976 .

[7]  Hamid R. Berenji,et al.  Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[8]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[9]  L. Darrell Whitley,et al.  Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.

[10]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[11]  Atsushi Nakayama,et al.  Learning to Control an Inverted Pendulum Using Neural Network without Teaching Signals. , 1993 .

[12]  Masumi Ishikawa,et al.  Determination of the optimal values of parameters in reinforcement learning for mobile robot navigation by a genetic algorithm , 2004 .

[13]  Nicholas R. Jennings,et al.  Using ARCHON to Develop Real-World DAI Applications for Electricity Transportation and Particle Acce , 1995 .

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  Nicholas R. Jennings,et al.  Using ARCHONTM to develop real-world DAI applications for electricity transportation management and particle accelerator control , 2007 .

[16]  Karsten Berns,et al.  A learning architecture based on reinforcement learning for adaptive control of the walking machine LAURON , 1995, Robotics Auton. Syst..

[17]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[18]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[21]  Frederick Mosteller,et al.  Stochastic Models for Learning , 1956 .

[22]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[23]  Hyung Suck Cho,et al.  A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning , 1995, IEEE Trans. Syst. Man Cybern..