Robust Dynamic Programming

In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a set of conditional measures with each state-action pair. Consequently, in the robust formulation each policy has a set of measures associated with it. We prove that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts. We discuss techniques from Nilim and El Ghaoui [17] for constructing suitable sets of conditional measures that allow one to efficiently solve for the optimal robust policy. We also show that robust DP is equivalent to stochastic zero-sum games with perfect information.

[1]  Dean Gillette,et al.  9. STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES , 1958 .

[2]  D. Ellsberg Decision, probability, and utility: Risk, ambiguity, and the Savage axioms , 1961 .

[3]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[4]  I. Gilboa,et al.  Maxmin Expected Utility with Non-Unique Prior , 1989 .

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[7]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  Arkadi Nemirovski,et al.  Robust Truss Topology Design via Semidefinite Programming , 1997, SIAM J. Optim..

[11]  Arkadi Nemirovski,et al.  Robust Convex Optimization , 1998, Math. Oper. Res..

[12]  E. Altman,et al.  Weighted Discounted Stochastic Games with Perfect Information , 2000 .

[13]  T. Sargent,et al.  Robust Control and Model Uncertainty , 2001 .

[14]  Alexander Shapiro,et al.  Minimax analysis of stochastic problems , 2002, Optim. Methods Softw..

[15]  Larry G. Epstein,et al.  Learning Under Ambiguity , 2002 .

[16]  Donald Goldfarb,et al.  Robust Portfolio Selection Problems , 2003, Math. Oper. Res..

[17]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[18]  Laurent El Ghaoui,et al.  Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[19]  Martin Schneider,et al.  Recursive multiple-priors , 2003, J. Econ. Theory.

[20]  John N. Tsitsiklis,et al.  Dynamic Catalog Mailing Policies , 2006, Manag. Sci..