论文信息 - Variance-Penalized Markov Decision Processes

Variance-Penalized Markov Decision Processes

We consider a Markov decision process with both the expected limiting average, and the discounted total return criteria, appropriately modified to include a penalty for the variability in the stream of rewards. In both cases we formulate appropriate nonlinear programs in the space of state-action frequencies averaged, or discounted whose optimal solutions are shown to be related to the optimal policies in the corresponding “variance-penalized MDP.” The analysis of one of the discounted cases is facilitated by the introduction of a “Cartesian product of two independent MDPs.”

[1] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[2] A. Victor Cabot,et al. Solving Certain Nonconvex Quadratic Minimization Problems by Ranking the Extreme Points , 1970, Oper. Res..

[3] R. Howard,et al. Risk-Sensitive Markov Decision Processes , 1972 .

[4] S. C. Jaquette. Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .

[5] James E. Falk,et al. A Successive Underestimation Method for Concave Minimization Problems , 1976, Math. Oper. Res..

[6] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .

[7] Arie Hordijk,et al. Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[8] Kun-Jen Chung. Some topics in risk-sensitive stochastic dynamic models , 1985 .

[9] Mokrane Bouakiz. Risk-sensitivity in stochastic optimization with applications , 1985 .

[10] M. J. Sobel. Maximal mean/standard deviation ratio in an undiscounted MDP , 1985 .