Potential-based Shaping in Model-based Reinforcement Learning
暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[3] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[4] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[5] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[6] S. Zilberstein,et al. Solving Markov Decision Problems Using Heuristic Search , 1999 .
[7] Dale Schuurmans,et al. Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.
[8] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[9] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[10] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[11] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[12] Alexander L. Strehl,et al. PAC Reinforcement Learning Bounds for RTDP and Rand-RTDP Technical Report , 2006 .
[13] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.