Action Refinement in Reinforcement Learning by Probability Smoothing
暂无分享,去创建一个
Thomas G. Dietterich | Dídac Busquets | Ramón López de Mántaras | Carles Sierra | D. Busquets | C. Sierra | R. L. D. Mántaras
[1] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[2] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[4] Dídac Busquets,et al. Reinforcement learning for landmark-based robot navigation , 2002, AAMAS '02.
[5] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[6] Michael L. Littman,et al. Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.
[7] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..