A tutorial survey of reinforcement learning
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Stuart E. Dreyfus,et al. Applied Dynamic Programming , 1965 .
[3] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[4] A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[5] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[6] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[7] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[8] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[9] R. Sutton,et al. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.
[10] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[11] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[12] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[13] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[14] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.
[15] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[16] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[17] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .
[18] P. Anandan,et al. Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata , 1986 .
[19] Rodney A. Brooks,et al. Achieving Artificial Intelligence through Building Robots , 1986 .
[20] Andrew G. Barto,et al. Game-theoretic cooperativity in networks of self-interested units , 1987 .
[21] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[22] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[23] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[24] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[25] A. Klopf. A neuronal model of classical conditioning , 1988 .
[26] J. F. Shepanski,et al. Teaching Artificial Neural Systems To Drive: Manual Training Techniques For Autonomous Systems , 1988, Other Conferences.
[27] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[28] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[29] C. Watkins. Learning from delayed rewards , 1989 .
[30] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[31] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
[32] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[33] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.
[34] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .
[35] Peter Dayan,et al. Navigating Through Temporal Difference , 1990, NIPS.
[36] Michael C. Mozer,et al. Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.
[37] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[38] Michael I. Jordan,et al. A R-P learning applied to a network model of cortical area 7a , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[39] Richard S. Sutton,et al. Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.
[40] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[41] David Chapman,et al. Vision, instruction, and action , 1990 .
[42] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[43] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[44] John C. Platt. Leaning by Combining Memorization and Gradient Descent , 1990, NIPS.
[45] Jacques J. Vidal,et al. Adaptive Range Coding , 1990, NIPS.
[46] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[47] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.
[48] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .
[49] Jonathan Bachrach,et al. A Connectionist Learning Control Architecture for Navigation , 1990, NIPS.
[50] Ming Tan,et al. Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.
[51] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.
[52] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
[53] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[54] Sridhar Mahadevan,et al. Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.
[55] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[56] Carlos D. Brody,et al. Fast Learning with Predictive Forward Models , 1991, NIPS.
[57] V. Gullapalli,et al. A comparison of supervised and reinforcement learning methods on a reinforcement learning task , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.
[58] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.
[59] Lawrence Birnbaum,et al. Machine learning : proceedings of the Eighth International Workshop (ML91) , 1991 .
[60] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.
[61] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[62] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[63] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[64] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.
[65] Satinder P. Singh,et al. Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.
[66] P. Dayan. Reinforcing connectionism : learning the statistical way , 1991 .
[67] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[68] Long-Ji Lin,et al. Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .
[69] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.
[70] Michael P. Wellman,et al. Planning and Control , 1991 .
[71] Long Ji Lin,et al. Self-improvement Based on Reinforcement Learning, Planning and Teaching , 1991, ML.
[72] Alexander Linden. On Discontinuous Q-Functions in Reinforcment Learning , 1992, GWAI.
[73] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.
[74] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .
[75] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[76] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[77] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[78] J. Millán,et al. A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.
[79] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[80] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[81] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[82] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[83] Charles W. Anderson,et al. Q-Learning with Hidden-Unit Restarting , 1992, NIPS.
[84] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[85] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[86] Donald A. Sofge,et al. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .
[87] L.-J. Lin,et al. Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.
[88] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[89] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.
[90] Sebastian Thrun,et al. Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.
[91] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[92] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[93] Peter D. Lawrence,et al. Transition Point Dynamic Programming , 1993, NIPS.
[94] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[95] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.
[96] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[97] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[98] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[99] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[100] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .