A tutorial survey of reinforcement learning

This paper gives a compact, self-contained tutorial survey of reinforcement learning, a tool that is increasingly finding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  Stuart E. Dreyfus,et al.  Applied Dynamic Programming , 1965 .

[3]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[4]  A. L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[5]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[6]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[7]  Dimitri Bertsekas,et al.  Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[8]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[9]  R. Sutton,et al.  Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.

[10]  John S. Edwards,et al.  The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .

[11]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[13]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[15]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[16]  Patchigolla Kiran Kumar,et al.  A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[17]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[18]  P. Anandan,et al.  Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata , 1986 .

[19]  Rodney A. Brooks,et al.  Achieving Artificial Intelligence through Building Robots , 1986 .

[20]  Andrew G. Barto,et al.  Game-theoretic cooperativity in networks of self-interested units , 1987 .

[21]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[22]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[24]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[25]  A. Klopf A neuronal model of classical conditioning , 1988 .

[26]  J. F. Shepanski,et al.  Teaching Artificial Neural Systems To Drive: Manual Training Techniques For Autonomous Systems , 1988, Other Conferences.

[27]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[28]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[29]  C. Watkins Learning from delayed rewards , 1989 .

[30]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[31]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[32]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[33]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[34]  L. Baird,et al.  A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .

[35]  Peter Dayan,et al.  Navigating Through Temporal Difference , 1990, NIPS.

[36]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[37]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[38]  Michael I. Jordan,et al.  A R-P learning applied to a network model of cortical area 7a , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[39]  Richard S. Sutton,et al.  Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.

[40]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[41]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[42]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[43]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[44]  John C. Platt Leaning by Combining Memorization and Gradient Descent , 1990, NIPS.

[45]  Jacques J. Vidal,et al.  Adaptive Range Coding , 1990, NIPS.

[46]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[47]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[48]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[49]  Jonathan Bachrach,et al.  A Connectionist Learning Control Architecture for Navigation , 1990, NIPS.

[50]  Ming Tan,et al.  Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.

[51]  A. Moore Variable Resolution Dynamic Programming , 1991, ML.

[52]  Andrew G. Barto,et al.  On the Computational Economics of Reinforcement Learning , 1991 .

[53]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[54]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[55]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[56]  Carlos D. Brody,et al.  Fast Learning with Predictive Forward Models , 1991, NIPS.

[57]  V. Gullapalli,et al.  A comparison of supervised and reinforcement learning methods on a reinforcement learning task , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.

[58]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[59]  Lawrence Birnbaum,et al.  Machine learning : proceedings of the Eighth International Workshop (ML91) , 1991 .

[60]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[61]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[62]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[63]  Sebastian Thrun,et al.  Active Exploration in Dynamic Environments , 1991, NIPS.

[64]  Hyongsuk Kim,et al.  CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[65]  Satinder P. Singh,et al.  Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.

[66]  P. Dayan Reinforcing connectionism : learning the statistical way , 1991 .

[67]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[68]  Long-Ji Lin,et al.  Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[69]  Paul E. Utgoff,et al.  Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[70]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[71]  Long Ji Lin,et al.  Self-improvement Based on Reinforcement Learning, Planning and Teaching , 1991, ML.

[72]  Alexander Linden On Discontinuous Q-Functions in Reinforcment Learning , 1992, GWAI.

[73]  R.J. Williams,et al.  Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[74]  S. Thrun Eecient Exploration in Reinforcement Learning , 1992 .

[75]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[76]  Andrew W. Moore,et al.  Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.

[77]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[78]  J. Millán,et al.  A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.

[79]  Satinder P. Singh,et al.  Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.

[80]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[81]  Steven J. Bradtke,et al.  Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[82]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[83]  Charles W. Anderson,et al.  Q-Learning with Hidden-Unit Restarting , 1992, NIPS.

[84]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[85]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[86]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[87]  L.-J. Lin,et al.  Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[88]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[89]  Roderic A. Grupen,et al.  Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[90]  Sebastian Thrun,et al.  Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.

[91]  Andrew G. Barto,et al.  Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[92]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[93]  Peter D. Lawrence,et al.  Transition Point Dynamic Programming , 1993, NIPS.

[94]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[95]  Richard W. Prager,et al.  A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.

[96]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[97]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[98]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[99]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[100]  Steven J. Bradtke,et al.  Incremental dynamic programming for on-line adaptive optimal control , 1995 .