Distributionally Robust Markov Decision Processes

We consider Markov decision processes where the values of the parameters are uncertain. This uncertainty is described by a sequence of nested sets (that is, each set contains the previous one), each of which corresponds to a probabilistic guarantee for a different confidence level. Consequently, a set of admissible probability distributions of the unknown parameters is specified. This formulation models the case where the decision maker is aware of and wants to exploit some (yet imprecise) a priori information of the distribution of parameters, and it arises naturally in practice where methods for estimating the confidence region of parameters abound. We propose a decision criterion based on distributional robustness: the optimal strategy maximizes the expected total reward under the most adversarial admissible probability distributions. We show that finding the optimal distributionally robust strategy can be reduced to the standard robust MDP where parameters are known to belong to a single uncertainty set; hence, it can be computed in polynomial time under mild technical conditions.

[1]  G. Calafiore,et al.  On Distributionally Robust Chance-Constrained Linear Programs , 2006 .

[2]  Melvyn Sim,et al.  Distributionally Robust Optimization and Its Tractable Approximations , 2010, Oper. Res..

[3]  K. Avrachenkov,et al.  Singular perturbations of Markov chains and decision processes , 2002 .

[4]  Larry G. Epstein,et al.  Learning Under Ambiguity , 2002 .

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Herbert E. Scarf,et al.  A Min-Max Solution of an Inventory Problem , 1957 .

[7]  Ioana Popescu,et al.  Robust Mean-Covariance Solutions for Stochastic Optimization , 2007, Oper. Res..

[8]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[9]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[10]  J. Filar,et al.  Algorithms for singularly perturbed limiting average Markov control problems , 1992 .

[11]  Shie Mannor,et al.  The Robustness-Performance Tradeoff in Markov Decision Processes , 2006, NIPS.

[12]  J. Baron Thinking and Deciding , 2023 .

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  柳井 浩,et al.  A Min Max Solution of an inventory problem , 1959 .

[15]  Samuel Karlin,et al.  THE THEORY OF INFINITE GAMES , 1953 .

[16]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[17]  Laurent El Ghaoui,et al.  Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[18]  John N. Tsitsiklis,et al.  Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[19]  David Kelsey Maxmin Expected Utility and Weight of Evidence , 1994 .

[20]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[21]  J. Wolfowitz,et al.  Elimination of Randomization in Certain Statistical Decision Procedures and Zero-Sum Two-Person Games , 1951 .

[22]  John N. Tsitsiklis,et al.  Bias and variance in value function estimation , 2004, ICML.

[23]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[24]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[25]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[26]  Alexander Shapiro,et al.  Worst-case distribution analysis of stochastic programs , 2006, Math. Program..

[27]  J. Dupacová The minimax approach to stochastic programming and an illustrative application , 1987 .

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[30]  M. Sion On general minimax theorems , 1958 .

[31]  Arkadi Nemirovski,et al.  Robust solutions of uncertain linear programs , 1999, Oper. Res. Lett..

[32]  Shie Mannor,et al.  Biases and Variance in Value Function Estimates , 2004 .

[33]  Terry J. Lyons,et al.  Stochastic finance. an introduction in discrete time , 2004 .

[34]  P. Kall,et al.  Stochastric programming with recourse: upper bounds and moment problems: a review , 1988 .

[35]  Shie Mannor,et al.  On Robustness / Performance Tradeoffs in Linear Programming and Markov Decision Processes , 2007 .

[36]  L. Lovász,et al.  Geometric Algorithms and Combinatorial Optimization , 1981 .

[37]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[38]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[39]  I. Gilboa,et al.  Maxmin Expected Utility with Non-Unique Prior , 1989 .

[40]  François Delebecque,et al.  Optimal control of markov chains admitting strong and weak interactions , 1981, Autom..

[41]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[42]  J. Filar,et al.  Perturbation and stability theory for Markov control problems , 1992 .

[43]  M. A. Girshick,et al.  Theory of games and statistical decisions , 1955 .