Partially Observable Risk-Sensitive Markov Decision Processes

We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon that is generated by a partially observable Markov decision process (POMDP). In contrast to a risk-neutral decision maker, this optimization criterion takes the variability of the cost into account. It contains as a special case the classical risk-sensitive optimization criterion with an exponential utility. We show that this optimization problem can be solved by embedding the problem into a completely observable Markov decision process with extended state space and give conditions under which an optimal policy exists. The state space has to be extended by the joint conditional distribution of current unobserved state and accumulated cost. In case of an exponential utility, the problem simplifies considerably and we rediscover what in previous literature has been named information state. However, since we do not use any change of measure techniques here, our approach is s...

[1]  A. Yushkevich Reduction of a Controlled Markov Model with Incomplete Data to a Problem with Complete Information in the Case of Borel State and Control Space , 1976 .

[2]  Lukasz Stettner,et al.  Risk-Sensitive Control of Discrete-Time Markov Processes with Infinite Horizon , 1999, SIAM J. Control. Optim..

[3]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[4]  Mark H. A. Davis,et al.  Risk-Sensitive Investment Management , 2014 .

[5]  L. Stettner,et al.  Risk sensitive control of discrete time partially observed Markov processes with infinite horizon , 1999 .

[6]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[7]  W. Fleming,et al.  Risk-Sensitive Control of Finite State Machines on an Infinite Horizon I , 1997 .

[8]  Michael Z. Zgurovsky,et al.  Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities , 2016, Math. Oper. Res..

[9]  L. Stettner,et al.  Approximations of discrete time partially observed control problems , 1994 .

[10]  P. Whittle Risk-sensitive linear/quadratic/gaussian control , 1981, Advances in Applied Probability.

[11]  Michael Z. Zgurovsky,et al.  Convergence of probability measures and Markov decision models with incomplete information , 2014 .

[12]  Lukasz Stettner Risk-sensitive portfolio optimization with completely and partially observed factors , 2004, IEEE Transactions on Automatic Control.

[13]  Anna Jaskiewicz,et al.  Risk-sensitive dividend problems , 2015, Eur. J. Oper. Res..

[14]  Daniel Hernández-Hernández,et al.  Successive approximations in partially observable controlled Markov chains with risk-sensitive average criterion , 2005 .

[15]  Jun-Yi Fu,et al.  Generalized vector equilibrium problems with set-valued mappings , 2002, Math. Methods Oper. Res..

[16]  A. Bensoussan Stochastic Control of Partially Observable Systems , 1992 .

[17]  Tomasz R. Bielecki,et al.  Economic Properties of the Risk Sensitive Criterion for Portfolio Management , 2003 .

[18]  Daniel Hernández-Hernández,et al.  A Characterization of the Optimal Certainty Equivalent of the Average Cost via the Arrow-Pratt Sensitivity Function , 2016, Math. Oper. Res..

[19]  S.,et al.  Risk-Sensitive Control and Dynamic Games for Partially Observed Discrete-Time Nonlinear Systems , 1994 .

[20]  U. Rieder,et al.  Markov Decision Processes with Applications to Finance , 2011 .

[21]  D. Hernández-Hernández Partially Observed Control Problems with Multiplicative Cost , 1999 .

[22]  M. Aoki Optimal control of partially observable Markovian systems , 1965 .

[23]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[24]  A. Ruszczynski,et al.  Process-based risk measures and risk-averse control of discrete-time systems , 2014, Math. Program..

[25]  Nicole Bäuerle,et al.  More Risk-Sensitive Markov Decision Processes , 2014, Math. Oper. Res..

[26]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[27]  William B. Haskell,et al.  A Convex Analytic Approach to Risk-Aware Markov Decision Processes , 2015, SIAM J. Control. Optim..

[28]  Jingnan Fan Process-Based Risk Measures for Observable and Partially Observable Discrete-Time Controlled Systems , 2015 .

[29]  Alfred Müller Expected utility maximization of optimal stopping problems , 2000, Eur. J. Oper. Res..

[30]  D. Rhenius Incomplete Information in Markovian Decision Models , 1974 .