Monte-Carlo tree search for Bayesian reinforcement learning
暂无分享,去创建一个
[1] Jianghao Li,et al. Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm3 omni-directional mobile microrobot , 2011, Applied Intelligence.
[2] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[3] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[4] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[5] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[7] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[8] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[9] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[10] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[11] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[12] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[13] Vittaldas V. Prabhu,et al. Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems , 2004, Applied Intelligence.
[14] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[15] Paloma Martínez,et al. Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning , 2009, Applied Intelligence.
[16] Ole-Christoffer Granmo,et al. Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game , 2013, Applied Intelligence.
[17] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[18] Nguyen Hoang Viet,et al. Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks , 2009 .
[19] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[20] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[21] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[24] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[25] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[26] Maziar Palhang,et al. Multi-criteria expertness based cooperative Q-learning , 2012, Applied Intelligence.
[27] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[28] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[29] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[30] Michael L. Littman,et al. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.
[31] Nan Rong,et al. What makes some POMDP problems easy to approximate? , 2007, NIPS.
[32] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[33] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[34] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[35] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.
[36] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[37] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[38] TaeChoong Chung,et al. Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..
[39] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[40] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.