Learning to Explore and Exploit in POMDPs
暂无分享,去创建一个
[1] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] Matthew J. Beal. Variational algorithms for approximate Bayesian inference , 2003 .
[4] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.
[5] Hui Li,et al. Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..
[6] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[7] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[8] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.
[9] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[10] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.