Markov Decision Processes with Average-Value-at-Risk criteria

We investigate the problem of minimizing the Average-Value-at-Risk (AVaRτ) of the discounted cost over a finite and an infinite horizon which is generated by a Markov Decision Process (MDP). We show that this problem can be reduced to an ordinary MDP with extended state space and give conditions under which an optimal policy exists. We also give a time-consistent interpretation of the AVaRτ. At the end we consider a numerical example which is a simple repeated casino game. It is used to discuss the influence of the risk aversion parameter τ of the AVaRτ-criterion.

[1]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[2]  David Heath,et al.  Coherent multiperiod risk adjusted values and Bellman’s principle , 2007, Ann. Oper. Res..

[3]  Alexander Shapiro,et al.  On a time consistency concept in risk averse multistage stochastic programming , 2009, Oper. Res. Lett..

[4]  U. Rieder,et al.  Markov Decision Processes with Applications to Finance , 2011 .

[5]  Jerzy A. Filar,et al.  Stochastic target hitting time and the problem of early retirement , 2004, IEEE Transactions on Automatic Control.

[6]  D. Tasche,et al.  On the coherence of expected shortfall , 2001, cond-mat/0104295.

[7]  Congbin Wu,et al.  Minimizing risk models in Markov decision processes with policies depending on target values , 1999 .

[8]  E. J. Collins,et al.  Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state , 1998, Advances in Applied Probability.

[9]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[10]  Duan Li,et al.  Optimal Dynamic Portfolio Selection: Multiperiod Mean‐Variance Formulation , 2000 .

[11]  Jonathan Theodor Ott,et al.  A Markov Decision Model for a Surveillance Application and Risk-Sensitive Markov Decision Processes , 2010 .

[12]  D. White Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[13]  M. K rn,et al.  Stochastic Optimal Control , 1988 .

[14]  Tomas Bjork,et al.  A General Theory of Markovian Time Inconsistent Stochastic Control Problems , 2010 .

[15]  Nicole Bäuerle,et al.  Dynamic mean-risk optimization in a binomial model , 2009, Math. Methods Oper. Res..

[16]  S. C. Jaquette Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .

[17]  Jocelyne Bion-Nadal,et al.  Dynamic risk measures: Time consistency and risk measures from BMO martingales , 2008, Finance Stochastics.

[18]  Jerzy A. Filar,et al.  Time Consistent Dynamic Risk Measures , 2006, Math. Methods Oper. Res..