A Tutorial Survey of Reinforcement Learn

This paper gives a compact, self{contained tutorial survey of reinforcement learning, a tool that is increasingly nding application in the development o f i n telligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.

[1]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[2]  A. L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[3]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[4]  R. Sutton,et al.  Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.

[5]  John S. Edwards,et al.  The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .

[6]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[7]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[8]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[9]  Patchigolla Kiran Kumar,et al.  A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[10]  P. Anandan,et al.  Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata , 1986 .

[11]  Rodney A. Brooks,et al.  Achieving Artificial Intelligence through Building Robots , 1986 .

[12]  Andrew G. Barto,et al.  Game-theoretic cooperativity in networks of self-interested units , 1987 .

[13]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[14]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[16]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[17]  A. Klopf A neuronal model of classical conditioning , 1988 .

[18]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[19]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[20]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[21]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[22]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[23]  L. Baird,et al.  A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .

[24]  Peter Dayan,et al.  Navigating Through Temporal Difference , 1990, NIPS.

[25]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[26]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[27]  Michael I. Jordan,et al.  A R-P learning applied to a network model of cortical area 7a , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[28]  Richard S. Sutton,et al.  Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.

[29]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[30]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[31]  John C. Platt Leaning by Combining Memorization and Gradient Descent , 1990, NIPS.

[32]  Jacques J. Vidal,et al.  Adaptive Range Coding , 1990, NIPS.

[33]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[34]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[35]  Jonathan Bachrach,et al.  A Connectionist Learning Control Architecture for Navigation , 1990, NIPS.

[36]  Ming Tan,et al.  Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.

[37]  Andrew G. Barto,et al.  On the Computational Economics of Reinforcement Learning , 1991 .

[38]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[39]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[40]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[41]  Carlos D. Brody,et al.  Fast Learning with Predictive Forward Models , 1991, NIPS.

[42]  V. Gullapalli,et al.  A comparison of supervised and reinforcement learning methods on a reinforcement learning task , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.

[43]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[44]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[45]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[46]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[47]  Sebastian Thrun,et al.  Active Exploration in Dynamic Environments , 1991, NIPS.

[48]  Hyongsuk Kim,et al.  CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[49]  Satinder P. Singh,et al.  Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.

[50]  P. Dayan Reinforcing connectionism : learning the statistical way , 1991 .

[51]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[52]  Long-Ji Lin,et al.  Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[53]  Paul E. Utgoff,et al.  Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[54]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[55]  Long Ji Lin,et al.  Self-improvement Based on Reinforcement Learning, Planning and Teaching , 1991, ML.

[56]  R.J. Williams,et al.  Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[57]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[58]  Andrew W. Moore,et al.  Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.

[59]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[60]  J. Millán,et al.  A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.

[61]  Satinder P. Singh,et al.  Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.

[62]  Steven J. Bradtke,et al.  Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[63]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[64]  Charles W. Anderson,et al.  Q-Learning with Hidden-Unit Restarting , 1992, NIPS.

[65]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[66]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[67]  L.-J. Lin,et al.  Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[68]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[69]  Roderic A. Grupen,et al.  Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[70]  Sebastian Thrun,et al.  Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.

[71]  Andrew G. Barto,et al.  Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[72]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[73]  Peter D. Lawrence,et al.  Transition Point Dynamic Programming , 1993, NIPS.

[74]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[75]  Richard W. Prager,et al.  A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.

[76]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[77]  B Ravindran,et al.  A tutorial survey of reinforcement learning , 1994 .

[78]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[79]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[80]  Steven J. Bradtke,et al.  Incremental dynamic programming for on-line adaptive optimal control , 1995 .

[81]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[82]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[83]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[84]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[85]  Terrence J. Sejnowski,et al.  TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[86]  Satinder Singh,et al.  An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.

[87]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[88]  Richard S. Sutton,et al.  Landmark learning: An illustration of associative search , 1981, Biological Cybernetics.

[89]  Justin A. Boyan,et al.  Modular Neural Networks for Learning Context-Dependent Game Strategies , 2007 .

[90]  J. Walrand,et al.  Distributed Dynamic Programming , 2022 .