Hierarchical Average Reward Reinforcement Learning Hierarchical Average Reward Reinforcement Learning
暂无分享,去创建一个
[1] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[2] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[3] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[4] Prasad Tadepalli,et al. Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.
[5] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[6] Model-based Hierarchical Average-reward Reinforcement Learning , 2002, ICML.
[7] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[8] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[9] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[10] Sridhar Mahadevan,et al. Hierarchically Optimal Average Reward Reinforcement Learning , 2002, ICML.
[11] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[12] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[13] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[14] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[15] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.
[16] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[17] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.
[18] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[19] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[20] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[21] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[22] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[23] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[24] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[25] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[26] David Andre,et al. Programmable Reinforcement Learning Agents , 2000, NIPS.
[27] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .
[28] Gang Wang,et al. Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.
[29] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[30] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[31] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.
[32] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.