Hierarchical control and learning for markov decision processes
暂无分享,去创建一个
[1] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[2] Austin Tate,et al. Generating Project Networks , 1977, IJCAI.
[3] D. Rose,et al. Generalized nested dissection , 1977 .
[4] P. Varaiya,et al. Multilayer control of large Markov chains , 1978 .
[5] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[6] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .
[7] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[8] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[9] Gerald Tesauro,et al. Neurogammon Wins Computer Olympiad , 1989, Neural Computation.
[10] C. Golaszewski. On the supervisory control of discrete event systems , 1989 .
[11] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[12] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[13] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[14] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[15] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[16] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[17] Craig Boutilier,et al. Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.
[18] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[19] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[20] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[21] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[22] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[23] Nils J. Nilsson,et al. Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..
[24] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.
[25] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..
[26] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[27] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[28] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.
[29] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[30] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[31] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[32] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[33] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[34] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[35] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD-lambda Network , 1995, NIPS.
[36] Corso Elvezia. Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning , 1996 .
[37] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[38] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[39] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[40] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .
[41] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.
[42] Daishi Harada,et al. Reinforcement Learning with Time , 1997, AAAI/IAAI.
[43] Eric A. Hansen,et al. An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.
[44] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[45] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[46] Shieu-Hong Lin,et al. Exploiting structure for planning and control , 1997 .
[47] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.
[48] Wenju Liu,et al. Region-Based Approximations for Planning in Stochastic Domains , 1997, UAI.
[49] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[50] R. Sutton. Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .
[51] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[52] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.