论文信息 - An Introduction to Reinforcement Learning - 字舞流文

An Introduction to Reinforcement Learning

This paper surveys the historical basis of reinforcement learning and some of the current work from a computer scientist’s point of view. It is an outgrowth of a number of talks given by the authors, including a NATO Advanced Study Institute and tutorials at AAAI’94 and Machine Learning’94. Reinforcement learning is a popular model of the learning problems that are encountered by an agent that learns behavior through trial-and-error interactions with a dynamic environment. It has a strong family resemblance to work in psychology, but differs considerably in the details and in the use of the word “reinforcement.” It is appropriately thought of as a class of problems, rather than as a set of techniques. The paper addresses a variety of subproblems in reinforcement learning, including exploration vs. exploitation, learning from delayed reinforcement, learning and using models, generalization and hierarchy, and hidden state. It concludes with a survey of some practical systems and an assessment of the practical utility of current reinforcement-learning systems

Andrew W. Moore | Leslie Pack Kaelbling | Michael L. Littman | M. Littman | A. Moore | L. Kaelbling

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] R. Bellman. Dynamic programming. , 1957, Science.

[3] G. Siouris,et al. Optimum systems control , 1979, Proceedings of the IEEE.

[4] G. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[5] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[6] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[7] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[8] W. Cleveland,et al. Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[9] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .

[10] David H. Ackley,et al. Generalization and Scaling in Reinforcement Learning , 1989, NIPS.

[11] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[12] Rodney A. Brooks,et al. Learning to Coordinate Behaviors , 1990, AAAI.

[13] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[14] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[15] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.

[16] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[17] Sridhar Mahadevan,et al. Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[18] Richard S. Sutton,et al. Reinforcement learning architectures for animats , 1991 .

[19] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[20] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[21] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[22] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[23] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[24] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .

[25] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[26] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[27] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .

[28] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[29] L.-J. Lin,et al. Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[30] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[31] Leslie Pack Kaelbling,et al. Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[32] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[33] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[34] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[35] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[36] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[37] S. Schaal,et al. Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[38] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine-mediated learning.