Markov decision processes with recursive risk measures

In this paper, we consider risk-sensitive Markov Decision Processes (MDPs) with Borel state and action spaces and unbounded cost under both finite and infinite planning horizons. Our optimality criterion is based on the recursive application of static risk measures. This is motivated by recursive utilities in the economic literature, has been studied before for the entropic risk measure and is extended here to an axiomatic characterization of suitable risk measures. We derive a Bellman equation and prove the existence of Markovian optimal policies. For an infinite planning horizon, the model is shown to be contractive and the optimal policy to be stationary. Moreover, we establish a connection to distributionally robust MDPs, which provides a global interpretation of the recursively defined objective function. Monotone models are studied in particular.

[1]  Stoyan V. Stoyanov,et al.  Distortion Risk Measures in Portfolio Optimization , 2010 .

[2]  Yi Zhang,et al.  Markov decision processes with iterated coherent risk measures , 2014, Int. J. Control.

[3]  Evan L. Porteus,et al.  Temporal Resolution of Uncertainty and Dynamic Choice Theory , 1978 .

[4]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[5]  Martin Schneider,et al.  Recursive multiple-priors , 2003, J. Econ. Theory.

[6]  Frank Riedel,et al.  Dynamic Coherent Risk Measures , 2003 .

[7]  Daniel R. Jiang,et al.  Practicality of Nested Risk Measures for Dynamic Electric Vehicle Charging , 2016, 1605.02848.

[8]  Alexandre Street,et al.  Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences , 2014, Eur. J. Oper. Res..

[9]  Erlon Cristian Finardi,et al.  On Solving Multistage Stochastic Programs with Coherent Risk Measures , 2013, Oper. Res..

[10]  Nicole Bauerle,et al.  Optimal Dividend Payout Model with Risk Sensitive Preferences , 2016, 1605.09614.

[11]  N. Bäuerle,et al.  Stochastic Orders and Risk Measures: Consistency and Bounds , 2006 .

[12]  U. Rieder,et al.  Markov Decision Processes with Applications to Finance , 2011 .

[13]  Tomasz R. Bielecki,et al.  A Unified Approach to Time Consistency of Dynamic Risk Measures and Dynamic Performance Measures in Discrete Time , 2018, Math. Oper. Res..

[14]  Nicole Bauerle,et al.  Distributionally Robust Markov Decision Processes and Their Connection to Risk Measures , 2020, Math. Oper. Res..

[15]  Marc Goovaerts,et al.  Comonotonicity and Maximal Stop-Loss Premiums , 2000 .

[16]  Anna Jaskiewicz,et al.  Stochastic optimal growth model with risk sensitive preferences , 2015, Journal of Economics Theory.

[17]  T. Sargent,et al.  Discounted linear exponential quadratic Gaussian control , 1995, IEEE Trans. Autom. Control..

[18]  Nicole Bäuerle,et al.  Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..

[19]  Beatrice Acciaio,et al.  Dynamic risk measures , 2011 .

[20]  Shie Mannor,et al.  Sequential Decision Making With Coherent Risk , 2017, IEEE Transactions on Automatic Control.

[21]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[22]  Ludger Rüschendorf,et al.  Mathematical Risk Analysis: Dependence, Risk Bounds, Optimal Allocations and Portfolios , 2013 .

[23]  Jianjun Miao,et al.  Economic Dynamics in Discrete Time , 2014 .

[24]  Jan Dhaene,et al.  Remarks on quantiles and distortion risk measures , 2012 .

[25]  Andrzej Ruszczynski Erratum to: Risk-averse dynamic programming for Markov decision processes , 2014, Math. Program..

[26]  N. Bäuerle,et al.  Optimal risk allocation in reinsurance networks , 2017, Insurance, Mathematics & Economics.

[27]  Giacomo Scandolo,et al.  Conditional and dynamic convex risk measures , 2005, Finance Stochastics.

[28]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[29]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[30]  Alexander Harald Glauner Robust and Risk-Sensitive Markov Decision Processes with Applications to Dynamic Optimal Reinsurance , 2020 .

[31]  H. Föllmer,et al.  Stochastic Finance: An Introduction in Discrete Time , 2002 .

[32]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[33]  Alexander Shapiro Time consistency of dynamic risk measures , 2012, Oper. Res. Lett..

[34]  Alexander Shapiro,et al.  Minimax and risk averse multistage stochastic programming , 2012, Eur. J. Oper. Res..

[35]  J. Bion-Nadal Time consistent dynamic risk processes , 2009 .

[36]  A. Jaśkiewicz,et al.  A note on a new class of recursive utilities in Markov decision processes , 2017 .

[37]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[38]  Shie Mannor,et al.  Policy Gradient for Coherent Risk Measures , 2015, NIPS.

[39]  S. Weber DISTRIBUTION‐INVARIANT RISK MEASURES, INFORMATION, AND DYNAMIC CONSISTENCY , 2006 .

[40]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[41]  A. Pichler The natural Banach space for version independent risk measures , 2013, 1303.6675.

[42]  Alexander Shapiro,et al.  Risk neutral and risk averse Stochastic Dual Dynamic Programming method , 2013, Eur. J. Oper. Res..

[43]  Jochen Gönsch,et al.  Time-Consistent, Risk-Averse Dynamic Pricing , 2019, Eur. J. Oper. Res..

[44]  Bernardo K. Pagnoncelli,et al.  Risk aversion in multistage stochastic programming: A modeling and algorithmic perspective , 2016, Eur. J. Oper. Res..

[45]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.