RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning

We describe how to use robust Markov decision processes for value function approximation with state aggregation. The robustness serves to reduce the sensitivity to the approximation error of sub-optimal policies in comparison to classical methods such as fitted value iteration. This results in reducing the bounds on the γ-discounted infinite horizon performance loss by a factor of 1/(1-γ) while preserving polynomial-time computational complexity. Our experimental results show that using the robust representation can significantly improve the solution quality with minimal additional computational cost.

[1]  Shie Mannor,et al.  Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty , 2012, ICML.

[2]  Marek Petrik,et al.  Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds , 2012, ICML.

[3]  Andrey Bernstein,et al.  Adaptive Aggregation for Reinforcement Learning with Efficient Exploration: Deterministic Domains , 2008, COLT.

[4]  J. Tsitsiklis,et al.  Robust, risk-sensitive, and data-driven control of markov decision processes , 2007 .

[5]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[6]  Rémi Munos,et al.  Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..

[7]  Vivek F. Farias,et al.  Approximate Dynamic Programming via a Smoothed Linear Program , 2009, Oper. Res..

[8]  Robert L. Smith,et al.  Aggregation in Dynamic Programming , 1987, Oper. Res..

[9]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[10]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Marek Petrik,et al.  Solution Methods for Constrained Markov Decision Process with Continuous Probability Modulation , 2013, UAI.

[13]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[14]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[15]  Marek Petrik,et al.  Constraint relaxation in approximate linear programs , 2009, ICML '09.

[16]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[17]  Benjamin Van Roy Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[18]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[19]  Evan L. Porteus Foundations of Stochastic Inventory Theory , 2002 .

[20]  Daniel Kuhn,et al.  Robust Markov Decision Processes , 2013, Math. Oper. Res..

[21]  Peter Bro Miltersen,et al.  Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.

[22]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[23]  Andrew J. Schaefer,et al.  Robust Modified Policy Iteration , 2013, INFORMS J. Comput..

[24]  Marek Petrik,et al.  Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.