论文信息 - Robust Policy Optimization with Baseline Guarantees

Robust Policy Optimization with Baseline Guarantees

Our goal is to compute a policy that guarantees improved return over a baseline policy even when the available MDP model is inaccurate. The inaccurate model may be constructed, for example, by system identification techniques when the true model is inaccessible. When the modeling error is large, the standard solution to the constructed model has no performance guarantees with respect to the true model. In this paper we develop algorithms that provide such performance guarantees and show a trade-off between their complexity and conservatism. Our novel model-based safe policy search algorithms leverage recent advances in robust optimization techniques. Furthermore we illustrate the effectiveness of these algorithms using a numerical example.

M. Ghavamzadeh | Marek Petrik | Yinlam Chow

[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[3] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[4] E. Altman. Constrained Markov Decision Processes , 1999 .

[5] Ness B. Shroff,et al. Markov decision processes with uncertain transition rates: sensitivity and robust control , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[6] Paul R. Milgrom,et al. Envelope Theorems for Arbitrary Choice Sets , 2002 .

[7] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[8] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..

[9] Yinyu Ye,et al. Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[10] Daniel Kuhn,et al. Robust Markov Decision Processes , 2013, Math. Oper. Res..

[11] Patrick Jaillet,et al. Regret based Robust Solutions for Uncertain Markov Decision Processes , 2013, NIPS.