Optimistic planning for belief-augmented Markov Decision Processes
暂无分享,去创建一个
[1] J. Ingersoll. Theory of Financial Decision Making , 1987 .
[2] P. W. Jones,et al. Multi-armed Bandit Allocation Indices , 1989 .
[3] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[4] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[5] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[6] Tamer Basar,et al. Dual Control Theory , 2001 .
[7] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[8] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[9] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[10] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[13] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[15] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[16] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[17] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[18] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[19] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[20] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[21] Christos Dimitrakakis,et al. Tree Exploration for Bayesian RL Exploration , 2008, 2008 International Conference on Computational Intelligence for Modelling Control & Automation.
[22] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[23] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.
[24] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[25] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[26] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[27] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.
[28] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[29] Doina Precup,et al. Smarter Sampling in Model-Based Bayesian Reinforcement Learning , 2010, ECML/PKDD.
[30] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[31] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[32] Rémi Munos,et al. Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.
[33] M. Littman,et al. Approaching Bayes-optimalilty using Monte-Carlo tree search , 2011 .
[34] Bart De Schutter,et al. Optimistic planning for sparsely stochastic systems , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[35] Olivier Buffet,et al. Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.
[36] Damien Ernst,et al. Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning , 2012, EWRL.
[37] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[38] Michael L. Littman,et al. Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.
[39] Lucian Busoniu,et al. Optimistic planning for Markov decision processes , 2012, AISTATS.