论文信息 - Effect of Reward Function Choices in Risk-Averse Reinforcement Learning - 字舞流文

Effect of Reward Function Choices in Risk-Averse Reinforcement Learning

This paper studies Value-at-Risk problems in finite-horizon Markov decision processes (MDPs) with finite state space and two forms of reward function. Firstly we study the effect of reward function on two criteria in a short-horizon MDP. Secondly, for long-horizon MDPs, we estimate the total reward distribution in a finite-horizon Markov chain (MC) with the help of spectral theory and the central limit theorem, and present a transformation algorithm for the MCs with a three-argument reward function and a salvage reward.

Jia Yuan Yu | Shuai Ma | Shuai Ma

[1] Miguel A. Lejeune,et al. An Exact Solution Approach for Portfolio Optimization Problems Under Stochastic and Integer Constraints , 2009, Oper. Res..

[2] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Estimation and Control , 2015, ArXiv.

[3] Louis Wehenkel,et al. Risk-aware decision making and dynamic programming , 2008 .

[4] Jerzy A. Filar,et al. Time Consistent Dynamic Risk Measures , 2006, Math. Methods Oper. Res..

[5] Olivier Buffet,et al. Goal Probability Analysis in Probabilistic Planning: Exploring and Enhancing the State of the Art , 2016, J. Artif. Intell. Res..

[6] Harry Zheng. Efficient frontier of utility and CVaR , 2009, Math. Methods Oper. Res..

[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[9] Ka Fai Cedric Yiu. Optimal portfolios under a value-at-risk constraint , 2004 .

[10] Matthew J. Sobel,et al. Mean-Variance Tradeoffs in an Undiscounted MDP , 1994, Oper. Res..

[11] Xianping Guo,et al. Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes , 2015 .

[12] Frank Riedel,et al. Dynamic Coherent Risk Measures , 2003 .

[13] Jia Yuan Yu,et al. Central-limit approach to risk-aware Markov decision processes , 2015, ArXiv.

[14] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[15] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[16] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.

[17] Peng Dai,et al. Topological Value Iteration Algorithms , 2011, J. Artif. Intell. Res..

[18] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[19] Bart Selman,et al. Probabilistic planning with non-linear utility functions and worst-case guarantees , 2012, AAMAS.

[20] Shie Mannor,et al. Probabilistic Goal Markov Decision Processes , 2011, IJCAI.

[21] M. Bouakiz,et al. Target-level criterion in Markov decision processes , 1995 .

[22] Daniel Nyrén,et al. Mean-Variance Optimization , 2005 .

[23] D. Krass,et al. Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..

[24] P. Glynn. A Lyapunov Bound for Solutions of Poisson's Equation , 1989 .

[25] E. Altman. Constrained Markov Decision Processes , 1999 .

[26] D. White. Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[27] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .

[28] Yoshio Ohtsubo,et al. Optimal policy for minimizing risk models in Markov decision processes , 2002 .

[29] Ping Hou,et al. Revisiting Risk-Sensitive MDPs: New Algorithms and Results , 2014, ICAPS.

[30] Stella X. Yu,et al. Optimization Models for the First Arrival Target Distribution Function in Discrete Time , 1998 .

[31] Hector Geffner,et al. Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[32] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.

[33] Congbin Wu,et al. Minimizing risk models in Markov decision processes with policies depending on target values , 1999 .

[34] Alexander Shapiro,et al. Optimization of Convex Risk Functions , 2006, Math. Oper. Res..

[35] Krishnendu Chatterjee,et al. Markov Decision Processes with Multiple Long-Run Average Objectives , 2007, FSTTCS.

[36] U. Rieder,et al. Markov Decision Processes , 2010 .

[37] S. Meyn,et al. Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.

[38] Akifumi Kira,et al. Threshold probability of non-terminal type in finite horizon Markov decision processes , 2012 .

[39] T. Vorst. Optimal Portfolios under a Value at Risk Constraint , 2001 .