论文信息 - Constrained Bayesian Reinforcement Learning via Approximate Linear Programming - 字舞流文

Constrained Bayesian Reinforcement Learning via Approximate Linear Programming

In this paper, we highlight our recent work [9] considering the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning (BRL) in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based BRL algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an off-line manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.

Kee-Eung Kim | Pascal Poupart | Jongmin Lee | Youngsoo Jang | Kee-Eung Kim | P. Poupart | Jongmin Lee | Youngsoo Jang

[1] Kee-Eung Kim,et al. Point-Based Value Iteration for Constrained POMDPs , 2011, IJCAI.

[2] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[3] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[4] Christos Dimitrakakis,et al. Linear Bayesian Reinforcement Learning , 2013, IJCAI.

[5] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[6] Kee-Eung Kim,et al. Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes , 2015, AAAI.

[7] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[8] Nan Rong,et al. What makes some POMDP problems easy to approximate? , 2007, NIPS.

[9] Olivier Buffet,et al. Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.

[10] Steffen Udluft,et al. Safe exploration for reinforcement learning , 2008, ESANN.

[11] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[12] Tong Tang,et al. Proceedings of the European Symposium on Artificial Neural Networks , 2006 .

[13] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .

[14] M. D. Wilkinson,et al. Management science , 1989, British Dental Journal.

[15] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[16] E. Altman. Constrained Markov Decision Processes , 1999 .

[17] L. Goddard,et al. Operations Research (OR) , 2007 .

[18] Kee-Eung Kim,et al. Cost-Sensitive Exploration in Bayesian Reinforcement Learning , 2012, NIPS.

[19] Sethu Vijayakumar,et al. ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning , 2000 .

[20] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[21] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[22] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..

[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24] Zoubin Ghahramani,et al. Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[25] Ralph Neuneier,et al. Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[26] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.

[27] J. Meigs,et al. WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[28] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[29] David A. McAllester,et al. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence , 2009, UAI 2009.

[30] Gm Gero Walter,et al. Bayesian linear regression , 2009 .

[31] Shay B. Cohen,et al. Advances in Neural Information Processing Systems 25 , 2012, NIPS 2012.

[32] Robert M Thrall,et al. Mathematics of Operations Research. , 1978 .

[33] Javier García,et al. Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[34] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[35] William W. Cohen,et al. Proceedings of the 23rd international conference on Machine learning , 2006, ICML 2008.