Effect of Reward Function Choices in Risk-Averse Reinforcement Learning

This paper studies Value-at-Risk problems in finite-horizon Markov decision processes (MDPs) with finite state space and two forms of reward function. Firstly we study the effect of reward function on two criteria in a short-horizon MDP. Secondly, for long-horizon MDPs, we estimate the total reward distribution in a finite-horizon Markov chain (MC) with the help of spectral theory and the central limit theorem, and present a transformation algorithm for the MCs with a three-argument reward function and a salvage reward.

[1]  Miguel A. Lejeune,et al.  An Exact Solution Approach for Portfolio Optimization Problems Under Stochastic and Integer Constraints , 2009, Oper. Res..

[2]  Michael C. Fu,et al.  Cumulative Prospect Theory Meets Reinforcement Learning: Estimation and Control , 2015, ArXiv.

[3]  Louis Wehenkel,et al.  Risk-aware decision making and dynamic programming , 2008 .

[4]  Jerzy A. Filar,et al.  Time Consistent Dynamic Risk Measures , 2006, Math. Methods Oper. Res..

[5]  Olivier Buffet,et al.  Goal Probability Analysis in Probabilistic Planning: Exploring and Enhancing the State of the Art , 2016, J. Artif. Intell. Res..

[6]  Harry Zheng Efficient frontier of utility and CVaR , 2009, Math. Methods Oper. Res..

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[9]  Ka Fai Cedric Yiu Optimal portfolios under a value-at-risk constraint , 2004 .

[10]  Matthew J. Sobel,et al.  Mean-Variance Tradeoffs in an Undiscounted MDP , 1994, Oper. Res..

[11]  Xianping Guo,et al.  Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes , 2015 .

[12]  Frank Riedel,et al.  Dynamic Coherent Risk Measures , 2003 .

[13]  Jia Yuan Yu,et al.  Central-limit approach to risk-aware Markov decision processes , 2015, ArXiv.

[14]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[15]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[16]  Michael C. Fu,et al.  Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.

[17]  Peng Dai,et al.  Topological Value Iteration Algorithms , 2011, J. Artif. Intell. Res..

[18]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[19]  Bart Selman,et al.  Probabilistic planning with non-linear utility functions and worst-case guarantees , 2012, AAMAS.

[20]  Shie Mannor,et al.  Probabilistic Goal Markov Decision Processes , 2011, IJCAI.

[21]  M. Bouakiz,et al.  Target-level criterion in Markov decision processes , 1995 .

[22]  Daniel Nyrén,et al.  Mean-Variance Optimization , 2005 .

[23]  D. Krass,et al.  Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..

[24]  P. Glynn A Lyapunov Bound for Solutions of Poisson's Equation , 1989 .

[25]  E. Altman Constrained Markov Decision Processes , 1999 .

[26]  D. White Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[27]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[28]  Yoshio Ohtsubo,et al.  Optimal policy for minimizing risk models in Markov decision processes , 2002 .

[29]  Ping Hou,et al.  Revisiting Risk-Sensitive MDPs: New Algorithms and Results , 2014, ICAPS.

[30]  Stella X. Yu,et al.  Optimization Models for the First Arrival Target Distribution Function in Discrete Time , 1998 .

[31]  Hector Geffner,et al.  Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[32]  John N. Tsitsiklis,et al.  Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.

[33]  Congbin Wu,et al.  Minimizing risk models in Markov decision processes with policies depending on target values , 1999 .

[34]  Alexander Shapiro,et al.  Optimization of Convex Risk Functions , 2006, Math. Oper. Res..

[35]  Krishnendu Chatterjee,et al.  Markov Decision Processes with Multiple Long-Run Average Objectives , 2007, FSTTCS.

[36]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[37]  S. Meyn,et al.  Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.

[38]  Akifumi Kira,et al.  Threshold probability of non-terminal type in finite horizon Markov decision processes , 2012 .

[39]  T. Vorst Optimal Portfolios under a Value at Risk Constraint , 2001 .