Mean-Variance Tradeoffs in an Undiscounted MDP

A stationary policy and an initial state in an MDP Markov decision process induce a stationary probability distribution of the reward. The problem analyzed here is generating the Pareto optima in the sense of high mean and low variance of the stationary distribution. In the unichain case, Pareto optima can be computed either with policy improvement or with a linear program having the same number of variables and one more constraint than the formulation for gain-rate optimization. The same linear program suffices in the multichain case if the ergodic class is an element of choice.

[1]  M. J. Sobel,et al.  Discounted MDP's: distribution functions and exponential utility maximization , 1987 .

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  Toshihide Ibaraki,et al.  A parametric characterization and an ɛ-approximation scheme for the minimization of a quasiconcave program , 1987, Discret. Appl. Math..

[4]  P. Whittle Risk-Sensitive Optimal Control , 1990 .

[5]  E. Denardo On Linear Programming in a Markov Decision Problem , 1970 .

[6]  H. Kawai A variance minimization problem for a Markov decision process , 1987 .

[7]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[8]  Kun-Jen Chung Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case , 1994, Oper. Res..

[9]  Keith W. Ross,et al.  Variability Sensitive Markov Decision Processes , 1992, Math. Oper. Res..

[10]  Katta G. Murty,et al.  Computational complexity of parametric linear programming , 1980, Math. Program..

[11]  Kun-Jen Chung A note on maximal mean/standard deviation ratio in an undiscounted MDP , 1989 .

[12]  Matthew J. Sobel,et al.  Inventory Control with an Exponential Utility Criterion , 1992, Oper. Res..

[13]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[15]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[16]  E. Denardo,et al.  Multichain Markov Renewal Programs , 1968 .

[17]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[18]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[19]  Ying Huang,et al.  On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs , 1994, Math. Oper. Res..

[20]  M. J. Sobel Maximal mean/standard deviation ratio in an undiscounted MDP , 1985 .

[21]  Ward Whitt,et al.  Stochastic Abelian and Tauberian theorems , 1972 .

[22]  D. White Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[23]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[24]  D. Blackwell Discrete Dynamic Programming , 1962 .