ABC Reinforcement Learning

We introduce a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC). The advantage is that we only require a prior distribution on a class of simulators. This is useful when a probabilistic model of the underlying process is too complex to formulate, but where detailed simulation models are available. ABC-RL allows the use of any Bayesian reinforcement learning technique in this case. It can be seen as an extension of simulation methods to both planning and inference. We experimentally demonstrate the potential of this approach in a comparison with LSPI. Finally, we introduce a theorem showing that ABC is sound.

[1]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[4]  Christos Dimitrakakis,et al.  Rollout sampling approximate policy iteration , 2008, Machine Learning.

[5]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[6]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[7]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[8]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[9]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[10]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[11]  Joelle Pineau,et al.  Bayes-Adaptive POMDPs , 2007, NIPS.

[12]  Christos Dimitrakakis,et al.  Robust Bayesian Reinforcement Learning through Tight Lower Bounds , 2011, EWRL.

[13]  Thomas A. Dean,et al.  Asymptotic behaviour of approximate Bayesian estimators , 2011, 1105.3655.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[16]  Andrew Y. Ng,et al.  Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[17]  Sumeetpal S. Singh,et al.  Filtering via approximate Bayesian computation , 2010, Statistics and Computing.

[18]  Rémi Munos,et al.  Thompson Sampling: An Optimal Finite Time Analysis , 2012, ArXiv.

[19]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[20]  D. Bertsekas Rollout Algorithms for Constrained Dynamic Programming , 2005 .

[21]  Sumeetpal S. Singh,et al.  ASYMPTOTIC BEHAVIOUR OF APPROXIMATE , 2011 .

[22]  Michail G. Lagoudakis,et al.  Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.

[23]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[24]  Feng Wu,et al.  Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.

[25]  M. Degroot Optimal Statistical Decisions , 1970 .

[26]  John Geweke,et al.  Federal Reserve Bank of Minneapolis Research Department Staff Report 249 Using Simulation Methods for Bayesian Econometric Models: Inference, Development, and Communication , 2022 .

[27]  Doina Precup,et al.  Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.

[28]  Olivier Buffet,et al.  Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.

[29]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[30]  Christos Dimitrakakis,et al.  Beliefbox: A framework for statistical methods in sequential decision making , 2007 .

[31]  Bruno Scherrer,et al.  Classification-based Policy Iteration with a Critic , 2011, ICML.

[32]  D. Bertsekas Rollout Algorithms for Constrained Dynamic Programming 1 , 2005 .

[33]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[34]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[35]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[36]  Pascal Poupart,et al.  Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.

[37]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[38]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[39]  I. Pinelis On inequalities for sums of bounded random variables , 2006, math/0603030.

[40]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .