论文信息 - The variance of discounted Markov decision processes

The variance of discounted Markov decision processes

Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.

M. J. Sobel

[1] John G. Kemeny,et al. Finite Markov Chains. , 1960 .

[2] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[3] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[4] A. S. Harding. Markovian decision processes , 1970 .

[5] E. Denardo. Markov Renewal Programs with Small Interest Rates , 1971 .

[6] Petr Mandl. On the variance in controlled Markov chains , 1971, Kybernetika.

[7] D. Sworder,et al. Introduction to stochastic control , 1972 .

[8] S. C. Jaquette. Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .

[9] D. J. White. Technical Note - Dynamic Programming and Probabilistic Constraints , 1974, Oper. Res..

[10] Ioan M. Stancu-Minasian,et al. A Research Bibliography in Stochastic Programming, 1955-1975 , 1976, Oper. Res..

[11] Evan L. Porteus,et al. Temporal Resolution of Uncertainty and Dynamic Choice Theory , 1978 .

[12] John A. Ferejohn,et al. On the Foundations of Intertemporal Choice , 1978 .

[13] Roy Mendelssohn. A systematic approach to determining mean-variance tradeoffs when managing randomly varying populations , 1980 .