Mean, variance, and probabilistic criteria in finite Markov decision processes: A review

This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.

[1]  A. Charnes,et al.  Chance-Constrained Programming , 1959 .

[2]  G. Dantzig,et al.  THE DECOMPOSITION ALGORITHM FOR LINEAR PROGRAMS , 1961 .

[3]  Abraham Charnes,et al.  Chance Constraints and Normal Deviates , 1962 .

[4]  C. Derman Optimal Replacement and Maintenance Under Markovian Deterioration with Probability Bounds on Failure , 1963 .

[5]  C. Derman Stable sequential control rules and Markov chains , 1963 .

[6]  C. Derman On Sequential Control Processes , 1964 .

[7]  C. Derman,et al.  Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[8]  C. Derman,et al.  A Note on Memoryless Rules for Controlling Sequential Control Processes , 1966 .

[9]  H. J. Greenberg Dynamic Programming with Linear Uncertainty , 1968, Oper. Res..

[10]  D. J. White,et al.  Fundamentals of decision theory , 1969 .

[11]  A. Beja Probability Bounds in Replacement Policies for Markov Systems , 1969 .

[12]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[13]  S. C. Jaquette Markov Decision Processes with a New Optimality Criterion: Small Interest Rates , 1972 .

[14]  C. Derman,et al.  Constrained Markov Decision Chains , 1972 .

[15]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[16]  D. J. White Technical Note - Dynamic Programming and Probabilistic Constraints , 1974, Oper. Res..

[17]  M. J. Sobel Ordinal Dynamic Programming , 1975 .

[18]  Evan L. Porteus On the Optimality of Structured Policies in Countable Stage Decision Processes , 1975 .

[19]  S. C. Jaquette A Utility Criterion for Markov Decision Processes , 1976 .

[20]  Juval Goldwerger Dynamic Programming for a Stochastic Markovian Process with an Application to the Mean Variance Models , 1977 .

[21]  David M. Kreps Decision Problems with Expected Utility Critera, I: Upper and Lower Convergent Utility , 1977, Math. Oper. Res..

[22]  David M. Kreps Decision Problems with Expected Utility Criteria, II: Stationarity , 1977, Math. Oper. Res..

[23]  B. L. Miller Communication---On “Dynamic Programming for a Stochastic Markovian Process with an Application to the Mean Variance Models” by J. Goldwerger , 1978 .

[24]  E. Steinberg,et al.  A Preference Order Dynamic Program for a Knapsack Problem with Stochastic Rewards , 1979 .

[25]  Roy Mendelssohn A systematic approach to determining mean-variance tradeoffs when managing randomly varying populations , 1980 .

[26]  Moshe Sniedovich,et al.  Preference Order Stochastic Knapsack Problems: Methodological Issues , 1980 .

[27]  James G. Morris,et al.  Decision Problems Under Risk and Chance Constrained Programming: Dilemmas in the Transition , 1981 .

[28]  M. J. Sobel The variance of discounted Markov decision processes , 1982 .

[29]  D. White Optimality and efficiency , 1982 .

[30]  Moshe Sniedovich A Class of Variance-Constrained Problems , 1983, Oper. Res..

[31]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[32]  Jerzy A. Filar,et al.  Percentiles and markovian decision processes , 1983 .

[33]  Uriel G. Rothblum,et al.  Multiplicative Markov Decision Chains , 1984, Math. Oper. Res..

[34]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[35]  J. Filar,et al.  Gain/variability tradeoffs in undiscounted Markov decision processes , 1985, 1985 24th IEEE Conference on Decision and Control.

[36]  M. J. Sobel Maximal mean/standard deviation ratio in an undiscounted MDP , 1985 .