Discounted Markov decision processes with utility constraints

We consider utility-constrained Markov decision processes. The expected utility of the total discounted reward is maximized subject to multiple expected utility constraints. By introducing a corresponding Lagrange function, a saddle-point theorem of the utility constrained optimization is derived. The existence of a constrained optimal policy is characterized by optimal action sets specified with a parametric utility.

[1]  Mordecai Avriel,et al.  Nonlinear programming , 1976 .

[2]  M. Bouakiz,et al.  Target-level criterion in Markov decision processes , 1995 .

[3]  M. J. Sobel,et al.  Discounted MDP's: distribution functions and exponential utility maximization , 1987 .

[4]  Masami Kurano,et al.  Constrained markov decision processes with compact state and action spaces: the average case , 2000 .

[5]  M. Kurano,et al.  On the general utility of discounted Markov decision processes , 1998 .

[6]  E. Altman Constrained Markov Decision Processes , 1999 .

[7]  Jonathan M. Borwein,et al.  On Fan's minimax theorem , 1986, Math. Program..

[8]  D. White Minimizing a Threshold Probability in Discounted Markov Decision Processes , 1993 .

[9]  Victor R. Preedy,et al.  Analysis and Methods , 2008 .

[10]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[11]  S. C. Jaquette Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .

[12]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[13]  Uriel G. Rothblum,et al.  Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..

[14]  Linn I. Sennott,et al.  Constrained Discounted Markov Decision Chains , 1991, Probability in the Engineering and Informational Sciences.

[15]  Masami Yasuda,et al.  A UTILITY DEVIATION IN DISCOUNTED MARKOV DECISION PROCESSES WITH GENERAL UTILITY , 1996 .

[16]  V. Borkar Topics in controlled Markov chains , 1991 .