PAC-MDP Reinforcement Learning with Bayesian Priors

In an effort to build on recent advances in reinforcement learning and Bayesian modeling, this work (Asmuth et al., 2009) combines ideas from two lines of research on exploration in reinforcement learning or RL (Sutton & Barto, 1998). Bayesian RL research (Dearden et al., 1999; Poupart et al., 2006) formulates the RL problem as decision making in the belief space of all possible environment models. As such, it becomes meaningful to talk about optimal RL— selecting actions that maximize the expected long-term reward given the uncertainty in the model. Although progress has been made in approximating optimal policies in model belief space, these techniques have not been shown to scale well and come with no finite-sample guarantees on the quality of the derived policies.

[1]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[2]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[3]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[4]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[5]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[6]  Lihong Li,et al.  The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[7]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .

[10]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML.

[11]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[12]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[13]  Lihong Li,et al.  A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[14]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.