Loss bounds for uncertain transition probabilities in Markov decision processes

We analyze losses resulting from uncertain transition probabilities in Markov decision processes with bounded nonnegative rewards. We assume that policies are precomputed using exact dynamic programming with the estimated transition probabilities, but the system evolves according to different, true transition probabilities. Given a bound on the total variation error of estimated transition probability distributions, we derive upper bounds on the loss of expected total reward. The approach analyzes the growth of errors incurred by stepping backwards in time while precomputing value functions, which requires bounding a multilinear program. Loss bounds are given for the finite horizon undiscounted, finite horizon discounted, and infinite horizon discounted cases, and a tight example is shown.

[1]  Dimitri P. Bertsekas,et al.  Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.

[2]  Csaba Szepesvári,et al.  Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.

[3]  E. Silver MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES OR REWARDS , 1963 .

[4]  Masanori Hosaka,et al.  CONTROLLED MARKOV SET-CHAINS WITH DISCOUNTING , 1998 .

[5]  Rémi Munos,et al.  Error Bounds for Approximate Policy Iteration , 2003, ICML.

[6]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[7]  Russell Bent,et al.  Online stochastic combinatorial optimization , 2006 .

[8]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[9]  Satinder Singh,et al.  An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.

[10]  R. F. Drenick,et al.  Multilinear programming: Duality theories , 1992 .

[11]  John M Gozzolino,et al.  MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES , 1965 .

[12]  Alfred Müller,et al.  How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities? , 1997, Math. Oper. Res..

[13]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[14]  Chin Hon Tan,et al.  Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters , 2011, Journal of Applied Probability.

[15]  Chin Hon Tan,et al.  Sensitivity Analysis and Dynamic Programming , 2011 .

[16]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[17]  Scott Sanner,et al.  Solutions to Factored MDPs with Imprecise Transition Probabilities 1 , 2011 .

[18]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[19]  Wallace J. Hopp,et al.  Sensitivity analysis in discrete dynamic programming , 1988 .

[20]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[21]  Ricardo Shirota Filho,et al.  Multilinear and integer programming for markov decision processes with imprecise probabilities , 2007 .

[22]  Pascal Van Hentenryck,et al.  Performance Analysis of Online Anticipatory Algorithms for Large Multistage Stochastic Integer Programs , 2007, IJCAI.