Light robustness in the optimization of Markov decision processes with uncertain parameters

Abstract Markov decision processes are often specified with limited knowledge of the real behavior or are part of a partially unknown environment such that transition rates and rewards are not exactly known. Different models to describe this uncertainty in a formal way have been proposed. In all cases it is important to consider uncertainty during the computation of optimal control policies. Usually this is done by computing robust solutions which are optimal in the worst realization of uncertainty. However, such solutions tend to be very conservative. In this paper, we develop an approach to mitigate robustness by computing policies that are optimal in a predefined situation, like the average case, but also guarantee a minimal gain in all other situations, including the worst case. We present algorithms based on policy iteration that solve subproblems using Mixed Integer Linear Programming (MILP) or Nonlinear Programming (NLP).

[1]  Alexander Martin,et al.  Robust runway scheduling under uncertain conditions , 2016 .

[2]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[3]  M. Ghavamzadeh,et al.  Robust Policy Optimization with Baseline Guarantees , 2015, 1506.04514.

[4]  François Dufour,et al.  Finite Linear Programming Approximations of Constrained Discounted Markov Decision Processes , 2013, SIAM J. Control. Optim..

[5]  Peter Buchholz,et al.  Computation of weighted sums of rewards for concurrent MDPs , 2018, Math. Methods Oper. Res..

[6]  Arkadi Nemirovski,et al.  Robust Convex Optimization , 1998, Math. Oper. Res..

[7]  Carlo Mannino,et al.  A pattern based, robust approach to cyclic master surgery scheduling , 2012, J. Sched..

[8]  Shie Mannor,et al.  The Robustness-Performance Tradeoff in Markov Decision Processes , 2006, NIPS.

[9]  Matteo Fischetti,et al.  Fast Approaches to Improve the Robustness of a Railway Timetable , 2009, Transp. Sci..

[10]  Tobias Achterberg,et al.  Mixed Integer Programming: Analyzing 12 Years of Progress , 2013 .

[11]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[12]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[13]  Daniel Kuhn,et al.  Robust Markov Decision Processes , 2013, Math. Oper. Res..

[14]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[15]  R. Serfozo An Equivalence between Continuous and Discrete Time Markov Decision Processes. , 1976 .

[16]  Melvyn Sim,et al.  The Price of Robustness , 2004, Oper. Res..

[17]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[18]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[19]  Andrew J. Schaefer,et al.  Robust Modified Policy Iteration , 2013, INFORMS J. Comput..

[20]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[21]  Seong-Cheol Kang,et al.  A Robust Approach to Markov Decision Problems with Uncertain Transition Probabilities , 2008 .

[22]  William W. Hager,et al.  Updating the Inverse of a Matrix , 1989, SIAM Rev..

[23]  Peter Buchholz,et al.  Analysis of Markov Decision Processes Under Parameter Uncertainty , 2017, EPEW.

[24]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[25]  Ricardo Shirota Filho,et al.  Multilinear and integer programming for markov decision processes with imprecise probabilities , 2007 .

[26]  Pedro M. Castro,et al.  Global optimization of MIQCPs with dynamic piecewise relaxations , 2018, J. Glob. Optim..

[27]  Matthias Müller-Hannemann,et al.  The Price of Strict and Light Robustness in Timetable Information , 2014, Transp. Sci..

[28]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[29]  Matteo Fischetti,et al.  Light Robustness , 2009, Robust and Online Large-Scale Optimization.