Learning to Solve Markovian Decision Processes
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..
[3] F. Downton. Stochastic Approximation , 1969, Nature.
[4] Donald E. Kirk,et al. Optimal control theory : an introduction , 1970 .
[5] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..
[6] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.
[7] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[8] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[9] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[10] Martin L. Puterman,et al. THE ANALYTIC THEORY OF POLICY ITERATION , 1978 .
[11] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[12] Drew McDermott,et al. Planning and Acting , 1978, Cogn. Sci..
[13] R. Korf. Learning to solve problems by searching for macro-operators , 1983 .
[14] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[15] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .
[16] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[17] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[18] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .
[19] James L. McClelland,et al. Parallel Distributed Processing: Explorations in the Microstructure of Cognition : Psychological and Biological Models , 1986 .
[20] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[21] Lawrence D. Jackel,et al. Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..
[22] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[23] Ronald L. Rivest,et al. Game Tree Searching by Min/Max Approximation , 1987, Artif. Intell..
[24] John H. Holland,et al. Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.
[25] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[26] Philip E. Agre,et al. The dynamic structure of everyday life , 1988 .
[27] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.
[28] Richard S. Sutton,et al. Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.
[29] R. Sutton,et al. Connectionist Learning for Control: An Overview , 1989 .
[30] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[31] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[32] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[33] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.
[34] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.
[35] John J. Grefenstette,et al. Incremental Learning of Control Strategies with Genetic algorithms , 1989, ML.
[36] Rodney A. Brooks,et al. A robot that walks; emergent behaviors from a carefully evolved network , 1989, Proceedings, 1989 International Conference on Robotics and Automation.
[37] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .
[38] Michael I. Jordan,et al. Task Decomposition through Competition in A , 1990 .
[39] Paul J. Werbos,et al. Neurocontrol and related techniques , 1990 .
[40] Pattie Maes,et al. Designing autonomous agents: Theory and practice from biology to engineering and back , 1990, Robotics Auton. Syst..
[41] Paul E. Utgoff,et al. Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.
[42] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[43] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[44] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.
[45] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[46] Jonathan Bachrach,et al. A Connectionist Learning Control Architecture for Navigation , 1990, NIPS.
[47] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
[48] Michael I. Jordan,et al. Hierarchies of Adaptive Experts , 1991, NIPS.
[49] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[50] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[51] Michael I. Jordan,et al. Internal World Models and Supervised Learning , 1991, ML.
[52] Roderic A. Grupen,et al. Planning grasp strategies for multifingered robot hands , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.
[53] Maja J. Matarić. A Comparative Analysis of Reinforcement Learning Methods , 1991 .
[54] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..
[55] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[56] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[57] Satinder P. Singh,et al. Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.
[58] Michael I. Jordan,et al. Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..
[59] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[60] Rodney A. Brooks,et al. Intelligence Without Reason , 1991, IJCAI.
[61] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.
[62] Michael P. Wellman,et al. Planning and Control , 1991 .
[63] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.
[64] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .
[65] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[66] Richard Yee,et al. Abstraction in Control Learning , 1992 .
[67] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[68] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[69] Satinder Singh. The Ecient Learning of Multiple Task Sequences , 1992 .
[70] Sridhar Mahadevan,et al. Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.
[71] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[72] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[73] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[74] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .
[75] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[76] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[77] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[78] Paul E. Utgoff,et al. A Teaching Method for Reinforcement Learning , 1992, ML.
[79] R. Grupen,et al. Harmonic Control , 1992 .
[80] Sebastian Thrun,et al. Explanation-Based Neural Network Learning for Robot Control , 1992, NIPS.
[81] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[82] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[83] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .
[84] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[85] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[86] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[87] Roderic A. Grupen,et al. The applications of harmonic functions to robotics , 1993, J. Field Robotics.
[88] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[89] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[90] Bernard Delyon,et al. Accelerated Stochastic Approximation , 1993, SIAM J. Optim..
[91] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[92] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .
[93] Andrew G. Barto,et al. Reinforcement Learning and Dynamic Programming , 1995 .
[94] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[95] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..