Constrained Bayesian Reinforcement Learning via Approximate Linear Programming

In this paper, we highlight our recent work [9] considering the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning (BRL) in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based BRL algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an off-line manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.

[1]  Kee-Eung Kim,et al.  Point-Based Value Iteration for Constrained POMDPs , 2011, IJCAI.

[2]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[3]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[4]  Christos Dimitrakakis,et al.  Linear Bayesian Reinforcement Learning , 2013, IJCAI.

[5]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[6]  Kee-Eung Kim,et al.  Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes , 2015, AAAI.

[7]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[8]  Nan Rong,et al.  What makes some POMDP problems easy to approximate? , 2007, NIPS.

[9]  Olivier Buffet,et al.  Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.

[10]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Tong Tang,et al.  Proceedings of the European Symposium on Artificial Neural Networks , 2006 .

[13]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[14]  M. D. Wilkinson,et al.  Management science , 1989, British Dental Journal.

[15]  Andrew Y. Ng,et al.  Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[16]  E. Altman Constrained Markov Decision Processes , 1999 .

[17]  L. Goddard,et al.  Operations Research (OR) , 2007 .

[18]  Kee-Eung Kim,et al.  Cost-Sensitive Exploration in Bayesian Reinforcement Learning , 2012, NIPS.

[19]  Sethu Vijayakumar,et al.  ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning , 2000 .

[20]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[21]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[22]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[25]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[26]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[27]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[28]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[29]  David A. McAllester,et al.  Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence , 2009, UAI 2009.

[30]  Gm Gero Walter,et al.  Bayesian linear regression , 2009 .

[31]  Shay B. Cohen,et al.  Advances in Neural Information Processing Systems 25 , 2012, NIPS 2012.

[32]  Robert M Thrall,et al.  Mathematics of Operations Research. , 1978 .

[33]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[34]  Lihong Li,et al.  A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[35]  William W. Cohen,et al.  Proceedings of the 23rd international conference on Machine learning , 2006, ICML 2008.