Potential-Based Reward Shaping for POMDPs (Extended Abstract)
暂无分享,去创建一个
Sam Devlin | Leen-Kiat Soh | Daniel Kudenko | Adam Eck | D. Kudenko | Sam Devlin | Leen-Kiat Soh | A. Eck
[1] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..
[2] Olivier Buffet,et al. A POMDP Extension with Belief-dependent Rewards , 2010, NIPS.
[3] Richard L. Lewis,et al. Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents , 2011, AAAI.
[4] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.
[5] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.
[6] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[7] Xiaoping Chen,et al. FHHOP: A Factored Hybrid Heuristic Online Planning Algorithm for Large POMDPs , 2012, UAI.
[8] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[9] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[10] D.A. Castanon,et al. Rollout Algorithms for Stochastic Scheduling Problems , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).