A Tutorial Survey of Reinforcement Learn
暂无分享,去创建一个
[1] P. B. Coaker,et al. Applied Dynamic Programming , 1964 .
[2] A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[3] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[4] R. Sutton,et al. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.
[5] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[6] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[7] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.
[8] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[9] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[10] P. Anandan,et al. Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata , 1986 .
[11] Rodney A. Brooks,et al. Achieving Artificial Intelligence through Building Robots , 1986 .
[12] Andrew G. Barto,et al. Game-theoretic cooperativity in networks of self-interested units , 1987 .
[13] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[14] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[15] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[16] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[17] A. Klopf. A neuronal model of classical conditioning , 1988 .
[18] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[19] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[20] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
[21] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[22] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.
[23] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .
[24] Peter Dayan,et al. Navigating Through Temporal Difference , 1990, NIPS.
[25] Michael C. Mozer,et al. Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.
[26] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[27] Michael I. Jordan,et al. A R-P learning applied to a network model of cortical area 7a , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[28] Richard S. Sutton,et al. Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.
[29] David Chapman,et al. Vision, instruction, and action , 1990 .
[30] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[31] John C. Platt. Leaning by Combining Memorization and Gradient Descent , 1990, NIPS.
[32] Jacques J. Vidal,et al. Adaptive Range Coding , 1990, NIPS.
[33] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.
[34] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .
[35] Jonathan Bachrach,et al. A Connectionist Learning Control Architecture for Navigation , 1990, NIPS.
[36] Ming Tan,et al. Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.
[37] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
[38] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[39] Sridhar Mahadevan,et al. Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.
[40] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[41] Carlos D. Brody,et al. Fast Learning with Predictive Forward Models , 1991, NIPS.
[42] V. Gullapalli,et al. A comparison of supervised and reinforcement learning methods on a reinforcement learning task , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.
[43] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.
[44] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.
[45] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[46] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[47] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[48] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.
[49] Satinder P. Singh,et al. Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.
[50] P. Dayan. Reinforcing connectionism : learning the statistical way , 1991 .
[51] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[52] Long-Ji Lin,et al. Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .
[53] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.
[54] Michael P. Wellman,et al. Planning and Control , 1991 .
[55] Long Ji Lin,et al. Self-improvement Based on Reinforcement Learning, Planning and Teaching , 1991, ML.
[56] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.
[57] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[58] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[59] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[60] J. Millán,et al. A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.
[61] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[62] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[63] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[64] Charles W. Anderson,et al. Q-Learning with Hidden-Unit Restarting , 1992, NIPS.
[65] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[66] Donald A. Sofge,et al. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .
[67] L.-J. Lin,et al. Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.
[68] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[69] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.
[70] Sebastian Thrun,et al. Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.
[71] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[72] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[73] Peter D. Lawrence,et al. Transition Point Dynamic Programming , 1993, NIPS.
[74] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[75] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.
[76] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[77] B Ravindran,et al. A tutorial survey of reinforcement learning , 1994 .
[78] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[79] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[80] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .
[81] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[82] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.
[83] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[84] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[85] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[86] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[87] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[88] Richard S. Sutton,et al. Landmark learning: An illustration of associative search , 1981, Biological Cybernetics.
[89] Justin A. Boyan,et al. Modular Neural Networks for Learning Context-Dependent Game Strategies , 2007 .
[90] J. Walrand,et al. Distributed Dynamic Programming , 2022 .