论文信息 - Robust Markov Decision Process: Beyond Rectangularity

Robust Markov Decision Process: Beyond Rectangularity

Markov decision processes (MDPs) are a common approach used to model dynamic optimization problems. MDPs are specified by a set of states, actions, transition probability kernel and the rewards associated with transitions. The goal is to find a policy that maximizes the expected cumulated reward. However, in most real world problems, the model parameters are estimated from noisy observations and are uncertain. The optimal policy for the nominal parameters might be highly sensitive to even small perturbations in the parameters, leading to significantly suboptimal outcomes. To address this issue, we consider a robust approach where the uncertainty in probability transitions is modeled as an adversarial selection from an uncertainty set. Most prior works consider the case where uncertainty on transitions related to different states is uncoupled. However, the case of general uncertainty sets is known to be intractable. We consider a factor model where the transition probability is a linear function of a factor matrix that is uncertain and belongs to a factor matrix uncertainty set. It allows to model dependence between probability transitions across different states and it is significantly less conservative than prior approaches. We show that under a certain assumption, we can efficiently compute an optimal robust policy under the factor matrix uncertainty model. We show that an optimal robust policy can be chosen deterministic and in particular is an optimal policy for some transition kernel in the uncertainty set. This implies strong min-max duality. We introduce the robust counterpart of important structural results of classical MDPs and we provide a computational study to demonstrate the usefulness of our approach, where we present two examples where robustness improves the worst-case and the empirical performances while maintaining a reasonable performance on the nominal parameters.

Vineet Goyal | Julien Grand-Clement | Vineet Goyal | Julien Grand-Clément

[1] Daniel Kuhn,et al. Robust Markov Decision Processes , 2013, Math. Oper. Res..

[2] Arkadi Nemirovski,et al. Robust solutions of uncertain linear programs , 1999, Oper. Res. Lett..

[3] Marek Petrik,et al. Fast Bellman Updates for Robust MDPs , 2018, ICML.

[4] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[5] Shie Mannor,et al. Robust MDPs with k-Rectangular Uncertainty , 2016, Math. Oper. Res..

[6] Laurent El Ghaoui,et al. Robust Solutions to Least-Squares Problems with Uncertain Data , 1997, SIAM J. Matrix Anal. Appl..

[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8] Jean-Philippe Vial,et al. Robust Optimization , 2021, ICORES.

[9] Martin Schneider,et al. Recursive multiple-priors , 2003, J. Econ. Theory.

[10] David Moore,et al. Data Uncertainty in Markov Chains: Application to Cost-Effectiveness Analyses of Medical Innovations , 2018, Oper. Res..

[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12] Arkadi Nemirovski,et al. Robust Convex Optimization , 1998, Math. Oper. Res..

[13] Shie Mannor,et al. Reinforcement Learning in Robust Markov Decision Processes , 2013, Math. Oper. Res..

[14] Melvyn Sim,et al. The Price of Robustness , 2004, Oper. Res..

[15] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[16] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[17] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[18] Allen L. Soyster,et al. Technical Note - Convex Programming with Set-Inclusive Constraints and Applications to Inexact Linear Programming , 1973, Oper. Res..

[19] Wotao Yin,et al. A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[20] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..

[21] A. Shwartz,et al. Handbook of Markov decision processes : methods and applications , 2002 .

[22] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..

[23] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[24] Shie Mannor,et al. The Robustness-Performance Tradeoff in Markov Decision Processes , 2006, NIPS.