Hierarchically Optimal Average Reward Reinforcement Learning
暂无分享,去创建一个
[1] Gang Wang,et al. Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.
[2] Prasad Tadepalli,et al. Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.
[3] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[4] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[5] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[6] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[7] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.
[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[9] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[10] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[12] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.