An Introduction to Reinforcement Learning

This paper surveys the historical basis of reinforcement learning and some of the current work from a computer scientist’s point of view. It is an outgrowth of a number of talks given by the authors, including a NATO Advanced Study Institute and tutorials at AAAI’94 and Machine Learning’94. Reinforcement learning is a popular model of the learning problems that are encountered by an agent that learns behavior through trial-and-error interactions with a dynamic environment. It has a strong family resemblance to work in psychology, but differs considerably in the details and in the use of the word “reinforcement.” It is appropriately thought of as a class of problems, rather than as a set of techniques. The paper addresses a variety of subproblems in reinforcement learning, including exploration vs. exploitation, learning from delayed reinforcement, learning and using models, generalization and hierarchy, and hidden state. It concludes with a survey of some practical systems and an assessment of the practical utility of current reinforcement-learning systems

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  R. Bellman Dynamic programming. , 1957, Science.

[3]  G. Siouris,et al.  Optimum systems control , 1979, Proceedings of the IEEE.

[4]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[5]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[6]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[8]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[9]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[10]  David H. Ackley,et al.  Generalization and Scaling in Reinforcement Learning , 1989, NIPS.

[11]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[12]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[13]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[14]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[15]  A. Moore Variable Resolution Dynamic Programming , 1991, ML.

[16]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[17]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[18]  Richard S. Sutton,et al.  Reinforcement learning architectures for animats , 1991 .

[19]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[20]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[21]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[22]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[23]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[24]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[25]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[26]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[27]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[28]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[29]  L.-J. Lin,et al.  Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[30]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[31]  Leslie Pack Kaelbling,et al.  Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[32]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[33]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[34]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[35]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[36]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[37]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[38]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine-mediated learning.