MDPs: Learning in Varying Environments
暂无分享,去创建一个
[1] Andrew G. Barto,et al. DISCRETE AND CONTINUOUS MODELS , 1978 .
[2] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[3] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[4] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[5] Narendra Ahuja,et al. Gross motion planning—a survey , 1992, CSUR.
[6] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[7] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[8] Piero Mussio,et al. Toward a Practice of Autonomous Systems , 1994 .
[9] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[10] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.
[11] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[14] Katsuhisa Furuta,et al. Robust swing up control of double pendulum , 1995, Proceedings of 1995 American Control Conference - ACC'95.
[15] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.
[16] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[17] András Lörincz,et al. Self-Organizing Multi-Resolution Grid for Motion Planning and Control , 1996, Int. J. Neural Syst..
[18] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .
[19] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[20] András Lörincz,et al. Neurocontroller using dynamic state feedback for compensatory control , 1997, Neural Networks.
[21] Maja J. Matari,et al. Behavior-based Control: Examples from Navigation, Learning, and Group Behavior , 1997 .
[22] Maja J. Mataric,et al. Behaviour-based control: examples from navigation, learning, and group behaviour , 1997, J. Exp. Theor. Artif. Intell..
[23] Csaba Szepesvri,et al. An integrated architecture for motion‐control and path‐planning , 1998 .
[24] R. Sutton. Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .
[25] Doina Precup,et al. Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .
[26] Csaba Szepesvari. Static and Dynamic Aspects of Optimal Sequential Decision Making , 1998 .
[27] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[28] Csaba Szepesvári,et al. Approximate Inverse-Dynamics Based Robust Control Using Static And Dynamic Feedback , 1998 .
[29] András Lörincz,et al. An integrated architecture for motion-control and path-planning , 1998, J. Field Robotics.
[30] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[31] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[32] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..
[33] S.H.G. ten Hagen. Continuous State Space Q-Learning for control of Nonlinear Systems , 2001 .
[34] Frank van Harmelen,et al. Proceedings of the 15th European Conference on Artificial Intelligence , 2002 .
[35] András Lörincz,et al. Event-learning with a non-Markovian controller , 2002 .
[36] András Lörincz,et al. Reinforcement Learning Integrated with a Non-Markovian Controller , 2002, ECAI.
[37] András Lörincz,et al. Event-learning and robust policy heuristics , 2003, Cognitive Systems Research.
[38] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[39] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[40] András Lörincz,et al. Module-Based Reinforcement Learning: Experiments with a Real Robot , 1998, Machine Learning.
[41] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.