Efficient Bayesian Clustering for Reinforcement Learning
暂无分享,去创建一个
Zoran Popovic | Yun-En Liu | Emma Brunskill | Travis Mandel | Z. Popovic | Emma Brunskill | Yun-En Liu | Travis Mandel | Zoran Popovic | E. Brunskill
[1] K. Pearson. Biometrika , 1902, The American Naturalist.
[2] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[3] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[4] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[5] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[6] Andrew McCallum,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[7] Gilbert Laporte,et al. Annals of Operations Research , 1996 .
[8] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[9] Servicio Geológico Colombiano Sgc. Volume 4 , 2013, Journal of Diabetes Investigation.
[10] Kathleen Daly,et al. Volume 7 , 1998 .
[11] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[12] Constance de Koning,et al. Editors , 2003, Annals of Emergency Medicine.
[13] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[14] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[15] Martin A. Riedmiller,et al. Abstract State Spaces with History , 2006, NAFIPS 2006 - 2006 Annual Meeting of the North American Fuzzy Information Processing Society.
[16] Sridhar Mahadevan,et al. Constructing basis functions from directed graphs for value function approximation , 2007, ICML '07.
[17] Stephan Timmer,et al. Safe Q-Learning on Complete History Spaces , 2007, ECML.
[18] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
[19] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[20] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.
[21] Finale Doshi-Velez,et al. The Infinite Partially Observable Markov Decision Process , 2009, NIPS.
[22] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[23] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.
[24] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..
[25] Stephen Lin,et al. Evolutionary Tile Coding: An Automated State Abstraction Algorithm for Reinforcement Learning , 2010, Abstraction, Reformulation, and Approximation.
[26] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[27] McCollinChristopher. Applied stochastic models in business and industry , 2011 .
[28] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[29] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[30] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[31] Viet-Hung Dang,et al. Monte-Carlo tree search for Bayesian reinforcement learning , 2012, 2012 11th International Conference on Machine Learning and Applications.
[32] Wolfgang Ertel,et al. Monte-Carlo tree search for Bayesian reinforcement learning , 2012, Applied Intelligence.
[33] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[34] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[35] Ronald Ortner,et al. Noname manuscript No. (will be inserted by the editor) Adaptive Aggregation for Reinforcement Learning in Average Reward Markov Decision Processes , 2022 .
[36] Peter Dayan,et al. Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search , 2013, J. Artif. Intell. Res..
[37] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[38] Ronald Ortner,et al. Selecting Near-Optimal Approximate State Representations in Reinforcement Learning , 2014, ALT.
[39] Parag Singla,et al. ASAP-UCT: Abstraction of State-Action Pairs in UCT , 2015, IJCAI.
[40] Peter Kulchyski. and , 2015 .
[41] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[42] Federica Mandreoli,et al. Journal of Computer and System Sciences Special Issue on Query Answering on Graph-Structured Data , 2016, Journal of computer and system sciences (Print).