Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

The expected return is a widely used objective in decision making under uncertainty. Many algorithms, such as value iteration, have been proposed to optimize it. In risk-aware settings, however, the expected return is often not an appropriate objective to optimize. We propose a new optimization objective for risk-aware planning and show that it has desirable theoretical properties. We also draw connections to previously proposed objectives for risk-aware planing: minmax, exponential utility, percentile and mean minus variance. Our method applies to an extended class of Markov decision processes: we allow costs to be stochastic as long as they are bounded. Additionally, we present an efficient algorithm for optimizing the proposed objective. Synthetic and real-world experiments illustrate the effectiveness of our method, at scale.

[1]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[2]  R. Durrett Probability: Theory and Examples , 1993 .

[3]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[4]  M. Bouakiz,et al.  Target-level criterion in Markov decision processes , 1995 .

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Congbin Wu,et al.  Minimizing risk models in Markov decision processes with policies depending on target values , 1999 .

[9]  Phhilippe Jorion Value at Risk: The New Benchmark for Managing Financial Risk , 2000 .

[10]  Steven D. Levitt,et al.  On Modeling Risk in Markov Decision Processes , 2001 .

[11]  Sean P. Meyn,et al.  Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost , 2002, Math. Oper. Res..

[12]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  Shie Mannor,et al.  Percentile optimization in uncertain Markov decision processes with application to efficient exploration , 2007, ICML '07.

[16]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[17]  J. Tsitsiklis,et al.  Robust, risk-sensitive, and data-driven control of markov decision processes , 2007 .

[18]  Louis Wehenkel,et al.  Risk-aware decision making and dynamic programming , 2008 .

[19]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[20]  John N. Tsitsiklis,et al.  Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.