论文信息 - Efficient Bayesian Clustering for Reinforcement Learning

Efficient Bayesian Clustering for Reinforcement Learning

A fundamental artificial intelligence challenge is how to design agents that intelligently trade off exploration and exploitation while quickly learning about an unknown environment. However, in order to learn quickly, we must somehow generalize experience across states. One promising approach is to use Bayesian methods to simultaneously cluster dynamics and control exploration; unfortunately, these methods tend to require computationally intensive MCMC approximation techniques which lack guarantees. We propose Thompson Clustering for Reinforcement Learning (TCRL), a family of Bayesian clustering algorithms for reinforcement learning that leverage structure in the state space to remain computationally efficient while controlling both exploration and generalization. TCRL-Theoretic achieves near-optimal Bayesian regret bounds while consistently improving over a standard Bayesian exploration approach. TCRL-Relaxed is guaranteed to converge to acting optimally, and empirically outperforms state-of-the-art Bayesian clustering algorithms across a variety of simulated domains, even in cases where no states are similar.

[1] K. Pearson. Biometrika , 1902, The American Naturalist.

[2] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[3] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[4] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[5] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[6] Andrew McCallum,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[7] Gilbert Laporte,et al. Annals of Operations Research , 1996 .

[8] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[9] Servicio Geológico Colombiano Sgc. Volume 4 , 2013, Journal of Diabetes Investigation.

[10] Kathleen Daly,et al. Volume 7 , 1998 .

[11] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[12] Constance de Koning,et al. Editors , 2003, Annals of Emergency Medicine.

[13] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[14] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[15] Martin A. Riedmiller,et al. Abstract State Spaces with History , 2006, NAFIPS 2006 - 2006 Annual Meeting of the North American Fuzzy Information Processing Society.

[16] Sridhar Mahadevan,et al. Constructing basis functions from directed graphs for value function approximation , 2007, ICML '07.

[17] Stephan Timmer,et al. Safe Q-Learning on Complete History Spaces , 2007, ECML.

[18] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[19] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[20] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.

[21] Finale Doshi-Velez,et al. The Infinite Partially Observable Markov Decision Process , 2009, NIPS.

[22] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[23] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[24] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..

[25] Stephen Lin,et al. Evolutionary Tile Coding: An Automated State Abstraction Algorithm for Reinforcement Learning , 2010, Abstraction, Reformulation, and Approximation.

[26] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .

[27] McCollinChristopher. Applied stochastic models in business and industry , 2011 .