Markov Decision Processes with Imprecise Transition Probabilities

We present new numerical algorithms and bounds for the infinite horizon, discrete stage, finite state and action Markov decision process with imprecise transition probabilities. We assume that the transition probability mass vector for each state and action is described by a finite number of linear inequalities. This model of imprecision appears to be well suited for describing statistically determined confidence limits and/or natural language statements of likelihood. The numerical procedures for calculating an optimal max-min strategy are based on successive approximations, reward revision, and modified policy iteration. The bounds that are determined are at least as tight as currently available bounds for the case where the transition probabilities are precise.

[1]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[3]  M. Pollatschek,et al.  Algorithms for Stochastic Games with Geometrical Interpretation , 1969 .

[4]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[5]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[6]  Singiresu S. Rao,et al.  Algorithms for discounted stochastic games , 1973 .

[7]  J. Wal Discounted Markov games; successive approximation and stopping times , 1977 .

[8]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[9]  Awi Federgruen Successive Approximation Methods in Undiscounted Stochastic Games , 1980, Oper. Res..

[10]  Martin L. Puterman,et al.  Action Elimination Procedures for Modified Policy Iteration Algorithms , 1982, Oper. Res..

[11]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[12]  Andrew P. Sage,et al.  A model of multiattribute decisionmaking and trade-off weight determination under uncertainty , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Chelsea C. White,et al.  Reward Revision for Discounted Markov Decision Problems , 1985, Oper. Res..

[14]  D. J. White,et al.  Real Applications of Markov Decision Processes , 1985 .

[15]  Paul Snow Bayesian Inference without Point Estimates , 1986, AAAI.

[16]  Chelsea C. White,et al.  A Posteriori Representations Based on Linear Inequality Descriptions of a Priori and Conditional Probabilities , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Chelsea C. White,et al.  Parameter Imprecision in Finite State, Finite Action Dynamic Programs , 1986, Oper. Res..

[18]  R. T. Jenkins,et al.  4.4. Dynamic programming models , 1990 .

[19]  T. E. S. Raghavan,et al.  Algorithms for stochastic games — A survey , 1991, ZOR Methods Model. Oper. Res..

[20]  D. J. White Markov Decision Processes , 2006 .