Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search
暂无分享,去创建一个
Peter Dayan | David Silver | Arthur Guez | D. Silver | A. Guez | P. Dayan | David Silver
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.
[3] E. Silver. MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES OR REWARDS , 1963 .
[4] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..
[5] P. Randolph. Bayesian Decision Problems and Markov Chains , 1968 .
[6] J. K. Satia,et al. Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..
[7] Sheldon M. Ross,et al. Introduction to Stochastic Dynamic Programming: Probability and Mathematical , 1983 .
[8] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[9] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[10] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[11] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[12] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[13] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[14] Scott Moore. Applying Online Search Techniques to Reinforcement Learning , 1998 .
[15] Yoram Singer,et al. Efficient Bayesian Parameter Estimation in Large Discrete Domains , 1998, NIPS.
[16] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[17] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[18] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[19] Eric Allender,et al. Complexity of finite-horizon Markov decision process problems , 2000, JACM.
[20] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[21] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[22] Michael O. Duff,et al. Design for an Optimal Probe , 2003, ICML.
[23] Anne Condon,et al. On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..
[24] Terrence J. Sejnowski,et al. Exploration Bonuses and Dual Control , 1996, Machine Learning.
[25] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[26] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[27] Paul Bourgine,et al. Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.
[28] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[29] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[30] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[31] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[32] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[33] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[34] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..
[35] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[36] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.
[37] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[38] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[39] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[40] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[41] H. Kappen,et al. Optimal exploration as a symmetry breaking phenomenon , 2010 .
[42] Joshua B. Tenenbaum,et al. Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.
[43] Doina Precup,et al. Smarter Sampling in Model-Based Bayesian Reinforcement Learning , 2010, ECML/PKDD.
[44] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[45] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[46] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[47] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .
[48] Michael L. Littman,et al. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.
[49] Leslie Pack Kaelbling,et al. Bayesian Policy Search with Policy Priors , 2011, IJCAI.
[50] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[51] Olivier Buffet,et al. Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.
[52] Viet-Hung Dang,et al. Monte-Carlo tree search for Bayesian reinforcement learning , 2012, 2012 11th International Conference on Machine Learning and Applications.
[53] David Hsu,et al. Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.
[54] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[55] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.
[56] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[57] Lucian Busoniu,et al. Optimistic planning for belief-augmented Markov Decision Processes , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).