Extreme Risk Averse Policy for Goal-Directed Risk-Sensitive Markov Decision Process

The Goal-Directed Risk-Sensitive Markov Decision Process allows arbitrary risk attitudes for the probabilistic planning problem to reach a goal state. In this problem, the risk attitude is modeled by an expected exponential utility and a risk factor λ. However, the problem is not well defined for every λ, posing the problem of defining the maximum (extreme) value for this factor. In this paper, we propose an algorithm to find this e-extreme risk factor and the corresponding optimal policy.

[1]  S. C. Jaquette A Utility Criterion for Markov Decision Processes , 1976 .

[2]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[3]  F. B. Vernadat,et al.  Decisions with Multiple Objectives: Preferences and Value Tradeoffs , 1994 .

[4]  D. Krass,et al.  Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..

[5]  Uriel G. Rothblum,et al.  Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..

[6]  Vivek S. Borkar,et al.  A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..

[7]  Ralph L. Keeney,et al.  Decisions with multiple objectives: preferences and value tradeoffs , 1976 .

[8]  Stella X. Yu,et al.  Optimization Models for the First Arrival Target Distribution Function in Discrete Time , 1998 .

[9]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[10]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[11]  Ping Hou,et al.  Revisiting Risk-Sensitive MDPs: New Algorithms and Results , 2014, ICAPS.

[12]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[14]  Blai Bonet,et al.  A Concise Introduction to Models and Methods for Automated Planning , 2013, A Concise Introduction to Models and Methods for Automated Planning.

[15]  Vivek S. Borkar,et al.  Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..

[16]  Mohammad Ghavamzadeh,et al.  Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.

[17]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[19]  Stephen D. Patek,et al.  On terminating Markov decision processes with a risk-averse objective function , 2001, Autom..

[20]  Uriel G. Rothblum,et al.  Multiplicative Markov Decision Chains , 1984, Math. Oper. Res..

[21]  Ping Hou,et al.  Solving Risk-Sensitive POMDPs With and Without Cost Observations , 2016, AAAI.

[22]  Karel Sladký,et al.  Growth rates and average optimality in risk-sensitive Markov decision chains , 2008, Kybernetika.

[23]  M. J. Sobel The variance of discounted Markov decision processes , 1982 .

[24]  Valdinei Freire da Silva,et al.  Shortest Stochastic Path with Risk Sensitive Evaluation , 2012, MICAI.