Robust Markov Decision Processes

Markov decision processes MDPs are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use because of their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a prespecified probability 1-β. Afterward, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1-β. Our method involves the solution of tractable conic programs of moderate size.

[1]  David J. Woodruff,et al.  Statistical Inference for , 1951 .

[2]  Patrick Billingsley,et al.  Statistical inference for Markov processes , 1961 .

[3]  John M Gozzolino,et al.  MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES , 1965 .

[4]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[5]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[6]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[7]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[8]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[11]  R. Ash,et al.  Probability and measure theory , 1999 .

[12]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[13]  Donald Goldfarb,et al.  Second-order cone programming , 2003, Math. Program..

[14]  Sylvain Sorin,et al.  Stochastic Games and Applications , 2003 .

[15]  Yinyu Ye,et al.  A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..

[16]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[17]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[18]  Shie Mannor,et al.  The Robustness-Performance Tradeoff in Markov Decision Processes , 2006, NIPS.

[19]  John N. Tsitsiklis,et al.  Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[20]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[21]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[22]  Seong-Cheol Kang,et al.  A Robust Approach to Markov Decision Problems with Uncertain Transition Probabilities , 2008 .

[23]  Melvyn Sim,et al.  Distributionally Robust Optimization and Its Tractable Approximations , 2010, Oper. Res..

[24]  Y. Ye The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2010 .

[25]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[26]  Daniel Kuhn,et al.  Primal and dual linear decision rules in stochastic and robust optimization , 2011, Math. Program..

[27]  Lisa Turner,et al.  Applications of Second Order Cone Programming , 2012 .

[28]  J. Norris Appendix: probability and measure , 1997 .

[29]  Peter Bro Miltersen,et al.  Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.

[30]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[31]  Anton van den Hengel,et al.  Semidefinite Programming , 2014, Computer Vision, A Reference Guide.