论文信息 - Reinforcement Learning: A Survey

Reinforcement Learning: A Survey

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.

[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[3] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .

[4] R. Karp,et al. On Nonterminating Stochastic Games , 1966 .

[5] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[6] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[7] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[8] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[9] Raymond J. Bandlow. Theories of Learning, 4th Edition. By Ernest R. Hilgard and Gordon H. Bower. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1975 , 1976 .

[10] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[11] G. Siouris,et al. Optimum systems control , 1979, Proceedings of the IEEE.

[12] Alexander Graham,et al. Introduction to Control Theory, Including Optimal Control , 1980 .

[13] R. Mortensen. Introduction to Control Theory, Including Optimal Control (David Burghes and Alexander Graham) , 1982 .

[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[15] R.M. Dunn,et al. Brains, behavior, and robotics , 1983, Proceedings of the IEEE.

[16] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[17] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[18] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[19] James L. McClelland,et al. James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[20] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[21] R. Stengel. Stochastic Optimal Control: Theory and Application , 1986 .

[22] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[23] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[24] Ullrich Rüde. Mathematical and Computational Techniques for Multilevel Adaptive Methods , 1987 .

[25] George E. P. Box,et al. Empirical Model‐Building and Response Surfaces , 1988 .

[26] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[27] David K. Smith. Theory of Linear and Integer Programming , 1987 .

[28] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[29] W. Cleveland,et al. Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[30] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .

[31] C. Watkins. Learning from delayed rewards , 1989 .

[32] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[33] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[34] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .

[35] David H. Ackley,et al. Generalization and Scaling in Reinforcement Learning , 1989, NIPS.

[36] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[37] Rodney A. Brooks,et al. Learning to Coordinate Behaviors , 1990, AAAI.

[38] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[39] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[40] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.

[41] Hamid R. Berenji. Artificial Neural Networks and Approximate Reasoning for Intelligent Control in Space , 1991 .

[42] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[43] Sridhar Mahadevan,et al. Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[44] Chuen-Chien Lee,et al. A self‐learning rule‐based controller employing approximate reasoning and neural net concepts , 1991, Int. J. Intell. Syst..

[45] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.

[46] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[47] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.

[48] Christopher G. Atkeson,et al. Memory-Based Learning Control , 1991, 1991 American Control Conference.

[49] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[50] H. Berenji. Artificial Neural Networks and Approximate Reasoning for Intelligent Control in Space , 1991, 1991 American Control Conference.

[51] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[52] Anne Condon,et al. The Complexity of Stochastic Games , 1992, Inf. Comput..

[53] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[54] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[55] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[56] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[57] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[58] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[59] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[60] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .

[61] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .