Bayes-Adaptive Simulation-based Search with Value Function Approximation
暂无分享,去创建一个
Peter Dayan | David Silver | Nicolas Heess | Arthur Guez | D. Silver | N. Heess | A. Guez | P. Dayan | David Silver
[1] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[2] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[3] N. Gordon,et al. Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .
[4] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[5] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[6] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[7] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[8] Michael O. Duff,et al. Design for an Optimal Probe , 2003, ICML.
[9] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[10] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..
[11] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[12] David Hsu,et al. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.
[13] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[14] Carl E. Rasmussen,et al. Gaussian process dynamic programming , 2009, Neurocomputing.
[15] Brahim Chaib-draa,et al. Bayesian reinforcement learning in continuous POMDPs with gaussian processes , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[16] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[17] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[18] Michael L. Littman,et al. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.
[19] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[20] Regina Barzilay,et al. Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.
[21] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[22] David Hsu,et al. Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.
[23] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[24] Richard S. Sutton,et al. Temporal-difference search in computer Go , 2012, Machine Learning.
[25] Lucian Busoniu,et al. Optimistic planning for belief-augmented Markov Decision Processes , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).