论文信息 - A tutorial survey of reinforcement learning - 字舞流文

A tutorial survey of reinforcement learning

This paper gives a compact, self-contained tutorial survey of reinforcement learning, a tool that is increasingly finding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.

B Ravindran | S Sathiya Keerthi | Balaraman Ravindran | S. Sathiya Keerthi

[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2] Stuart E. Dreyfus,et al. Applied Dynamic Programming , 1965 .

[3] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[4] A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[5] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[6] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[7] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[8] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[9] R. Sutton,et al. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.

[10] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .

[11] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[12] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[13] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[14] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.

[15] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[16] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[17] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[18] P. Anandan,et al. Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata , 1986 .

[19] Rodney A. Brooks,et al. Achieving Artificial Intelligence through Building Robots , 1986 .

[20] Andrew G. Barto,et al. Game-theoretic cooperativity in networks of self-interested units , 1987 .

[21] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[22] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[23] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .

[24] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[25] A. Klopf. A neuronal model of classical conditioning , 1988 .

[26] J. F. Shepanski,et al. Teaching Artificial Neural Systems To Drive: Manual Training Techniques For Autonomous Systems , 1988, Other Conferences.

[27] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[28] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .

[29] C. Watkins. Learning from delayed rewards , 1989 .

[30] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[31] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[32] A. Barto,et al. Learning and Sequential Decision Making , 1989 .

[33] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[34] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .

[35] Peter Dayan,et al. Navigating Through Temporal Difference , 1990, NIPS.

[36] Michael C. Mozer,et al. Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[37] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..

[38] Michael I. Jordan,et al. A R-P learning applied to a network model of cortical area 7a , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[39] Richard S. Sutton,et al. Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.

[40] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[41] David Chapman,et al. Vision, instruction, and action , 1990 .

[42] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .

[43] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[44] John C. Platt. Leaning by Combining Memorization and Gradient Descent , 1990, NIPS.

[45] Jacques J. Vidal,et al. Adaptive Range Coding , 1990, NIPS.

[46] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[47] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[48] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[49] Jonathan Bachrach,et al. A Connectionist Learning Control Architecture for Navigation , 1990, NIPS.

[50] Ming Tan,et al. Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.

[51] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.

[52] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .

[53] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[54] Sridhar Mahadevan,et al. Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[55] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.

[56] Carlos D. Brody,et al. Fast Learning with Predictive Forward Models , 1991, NIPS.

[57] V. Gullapalli,et al. A comparison of supervised and reinforcement learning methods on a reinforcement learning task , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.

[58] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[59] Lawrence Birnbaum,et al. Machine learning : proceedings of the Eighth International Workshop (ML91) , 1991 .

[60] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[61] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[62] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.

[63] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.

[64] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[65] Satinder P. Singh,et al. Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.

[66] P. Dayan. Reinforcing connectionism : learning the statistical way , 1991 .

[67] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[68] Long-Ji Lin,et al. Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[69] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[70] Michael P. Wellman,et al. Planning and Control , 1991 .

[71] Long Ji Lin,et al. Self-improvement Based on Reinforcement Learning, Planning and Teaching , 1991, ML.

[72] Alexander Linden,et al. On Discontinuous Q-Functions in Reinforcment Learning , 1992, GWAI.

[73] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[74] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .

[75] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[76] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.

[77] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[78] J. Millán,et al. A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.

[79] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.

[80] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .

[81] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[82] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[83] Charles W. Anderson,et al. Q-Learning with Hidden-Unit Restarting , 1992, NIPS.

[84] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .

[85] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[86] Donald A. Sofge,et al. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[87] L.-J. Lin,et al. Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[88] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[89] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[90] Sebastian Thrun,et al. Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.

[91] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[92] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[93] Peter D. Lawrence,et al. Transition Point Dynamic Programming , 1993, NIPS.

[94] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[95] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.

[96] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[97] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[98] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[99] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[100] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .