Controlled Markov chains with risk-sensitive criteria: Average cost, optimality equations, and optimal solutions

Abstract. We study controlled Markov chains with denumerable state space and bounded costs per stage. A (long-run) risk-sensitive average cost criterion, associated to an exponential utility function with a constant risk sensitivity coefficient, is used as a performance measure. The main assumption on the probabilistic structure of the model is that the transition law satisfies a simultaneous Doeblin condition. Working within this framework, the main results obtained can be summarized as follows: If the constant risk-sensitivity coefficient is small enough, then an associated optimality equation has a bounded solution with a constant value for the optimal risk-sensitive average cost; in addition, under further standard continuity-compactness assumptions, optimal stationary policies are obtained. However, it is also shown that the above conclusions fail to hold, in general, for large enough values of the risk-sensitivity coefficient. Our results therefore disprove previous claims on this topic. Also of importance is the fact that our developments are very much self-contained and employ only basic probabilistic and analysis principles.

[1]  E. Fernández-Gaucherand,et al.  Risk-sensitive optimal control of hidden Markov models: structural results , 1997, IEEE Trans. Autom. Control..

[2]  T. Runolfsson The equivalence between infinite-horizon optimal control of stochastic systems with exponential-of-integral performance index and stochastic differential games , 1994, IEEE Trans. Autom. Control..

[3]  V. Borkar On Minimum Cost Per Unit Time Control of Markov Chains , 1984 .

[4]  Rhodes,et al.  Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games , 1973 .

[5]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[6]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[7]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[8]  E. Fernández-Gaucherand,et al.  Non-standard optimality criteria for stochastic control problems , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[9]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[10]  S.,et al.  Risk-Sensitive Control and Dynamic Games for Partially Observed Discrete-Time Nonlinear Systems , 1994 .

[11]  P. Whittle Risk-Sensitive Optimal Control , 1990 .

[12]  W. Fleming,et al.  Risk-Sensitive Control of Finite State Machines on an Infinite Horizon I , 1997 .

[13]  Michel Loève,et al.  Probability Theory I , 1977 .

[14]  R. Cavazos-Cadena Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains , 1988 .

[15]  E. Fernández-Gaucherand,et al.  Controlled Markov chains with risk-sensitive exponential average cost criterion , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[16]  S. Marcus,et al.  Risk sensitive control of Markov processes in countable state space , 1996 .

[17]  R. Cavazos-Cadena Necessary conditions for the optimality equation in average-reward Markov decision processes , 1989 .