Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Optimal solutions to Markov decision problems may be very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of these probabilities is far from accurate. Hence, estimation errors are limiting factors in applying Markov decision processes to real-world problems. We consider a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets. We show that perfect duality holds for this problem, and that as a consequence, it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm. We show that a particular choice of the uncertainty sets, involving likelihood regions or entropy bounds, leads to both a statistically accurate representation of uncertainty, and a complexity of the robust recursion that is almost the same as that of the classical recursion. Hence, robustness can be added at practically no extra computing cost. We derive similar results for other uncertainty sets, including one with a finite number of possible values for the transition matrices. We describe in a practical path planning example the benefits of using a robust strategy instead of the classical optimal strategy; even if the uncertainty level is only crudely guessed, the robust strategy yields a much better worst-case expected travel time.

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[3]  E. Lehmann,et al.  Testing Statistical Hypothesis. , 1960 .

[4]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[5]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[6]  A. Nowak On zero-sum stochastic games with general state space , 1984 .

[7]  J. Filar,et al.  Algorithms for singularly perturbed limiting average Markov control problems , 1992 .

[8]  J. Filar,et al.  Perturbation and stability theory for Markov control problems , 1992 .

[9]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[12]  Eitan Altman,et al.  Zero-sum Markov games and worst-case optimal control of queueing systems , 1995, Queueing Syst. Theory Appl..

[13]  George M. Siouris,et al.  An Engineering Approach to Optimal Control and Estimation Theory , 1996 .

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  Robert Givan,et al.  Bounded Parameter Markov Decision Processes , 1997, ECP.

[16]  E. Altman,et al.  Weighted Discounted Stochastic Games with Perfect Information , 2000 .

[17]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[18]  Vu Duong,et al.  Trajectory-based Air Traffic Management (TB-ATM) under Weather Uncertainty , 2001 .

[19]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[20]  Ness B. Shroff,et al.  Markov decision processes with uncertain transition rates: sensitivity and robust control , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[21]  Alexander Shapiro,et al.  Minimax analysis of stochastic problems , 2002, Optim. Methods Softw..

[22]  Eugene A. Feinberg,et al.  Handbook of Markov Decision Processes , 2002 .

[23]  Larry G. Epstein,et al.  Learning Under Ambiguity , 2002 .

[24]  John L. Crassidis,et al.  Optimal Control and Estimation Theory , 2004 .

[25]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[26]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.