论文信息 - Near-Optimal BRL using Optimistic Local Transitions

Near-Optimal BRL using Optimistic Local Transitions

Model-based Bayesian Reinforcement Learning (BRL) allows a sound formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the exploration-exploitation dilemma. However, algorithms explicitly addressing BRL suffer from such a combinatorial explosion that a large body of work relies on heuristic algorithms. This paper introduces BOLT, a simple and (almost) deterministic heuristic algorithm for BRL which is optimistic about the transition function. We analyze BOLT's sample complexity, and show that under certain parameters, the algorithm is near-optimal in the Bayesian sense with high probability. Then, experimental results highlight the key differences of this method compared to previous work.

Olivier Buffet | Vincent Thomas | Mauricio Araya-López

[1] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[2] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[3] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[4] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.

[5] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[6] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[7] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[8] Olivier Buffet,et al. Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.

[9] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[10] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[11] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[12] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[13] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.