论文信息 - Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case - 字舞流文

Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case

The problem analyzed here is the computation of Pareto optima in the sense of high mean and low variance of the stationary distribution in the unichain, undiscounted Markov decision process MDP, for short.

[1] Matthew J. Sobel,et al. Mean-Variance Tradeoffs in an Undiscounted MDP , 1994, Oper. Res..

[2] M. J. Sobel,et al. Discounted MDP's: distribution functions and exponential utility maximization , 1987 .

[3] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[4] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[5] Kun-Jen Chung. A note on maximal mean/standard deviation ratio in an undiscounted MDP , 1989 .

[6] D. White. Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[7] H. Kawai. A variance minimization problem for a Markov decision process , 1987 .