An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.

[1]  K. Rezaee,et al.  Benchmarking Constraint Inference in Inverse Reinforcement Learning , 2022, ICLR.

[2]  Shie Mannor,et al.  Efficient Risk-Averse Reinforcement Learning , 2022, NeurIPS.

[3]  Matthijs T. J. Spaan,et al.  WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning , 2021, AAAI.

[4]  Fan Zhou,et al.  Non-decreasing Quantile Function Network with Efficient Exploration for Distributional Reinforcement Learning , 2021, IJCAI.

[5]  A. Aghasi,et al.  Inverse Constrained Reinforcement Learning , 2020, International Conference on Machine Learning.

[6]  Mingyuan Zhou,et al.  Implicit Distributional Reinforcement Learning , 2020, NeurIPS.

[7]  Shimon Whiteson,et al.  Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning , 2020, AAAI.

[8]  Marcello Restelli,et al.  Risk-Averse Trust Region Optimization for Reward-Volatility Reduction , 2019, IJCAI.

[9]  Ruslan Salakhutdinov,et al.  Worst Cases Policy Gradients , 2019, CoRL.

[10]  G. Willmot,et al.  Characterization, Robustness and Aggregation of Signed Choquet Integrals , 2019 .

[11]  Tatsuya Mori,et al.  Learning Robust Options by Conditional Value at Risk Optimization , 2019, NeurIPS.

[12]  Mohammad Naghshvar,et al.  Risk-averse Behavior Planning for Autonomous Driving under Uncertainty , 2018, ArXiv.

[13]  Bo Liu,et al.  A Block Coordinate Ascent Algorithm for Mean-Variance Optimization , 2018, NeurIPS.

[14]  Jun Cai,et al.  Convex Risk Functionals: Representation and Applications , 2018, Insurance: Mathematics and Economics.

[15]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[16]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[17]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[18]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[19]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[20]  Ruodu Wang,et al.  Gini-Type Measures of Risk and Variability: Gini Shortfall, Capital Allocations, and Heavy-Tailed Risks , 2016 .

[21]  Marco Pavone,et al.  Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[22]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Michael I. Jordan,et al.  Trust Region Policy Optimization , 2015, ICML.

[25]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[26]  Shie Mannor,et al.  Optimizing the CVaR via Sampling , 2014, AAAI.

[27]  X. Zhou,et al.  MEAN–VARIANCE PORTFOLIO OPTIMIZATION WITH STATE‐DEPENDENT RISK AVERSION , 2014 .

[28]  Mohammad Ghavamzadeh,et al.  Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.

[29]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2012, J. Mach. Learn. Res..

[31]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[32]  John N. Tsitsiklis,et al.  Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.

[33]  Arjun K. Gupta,et al.  Convex Ordering of Random Variables and its Applications in Econometrics and Actuarial Science , 2010 .

[34]  Bogdan Grechuk,et al.  Maximum Entropy Principle with General Deviation Measures , 2009, Math. Oper. Res..

[35]  Stan Uryasev,et al.  Generalized deviations in risk analysis , 2004, Finance Stochastics.

[36]  Vivek S. Borkar,et al.  Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..

[37]  Duan Li,et al.  Optimal Dynamic Portfolio Selection: Multiperiod Mean‐Variance Formulation , 2000 .

[38]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[39]  M. Grabisch The application of fuzzy integrals in multicriteria decision making , 1996 .

[40]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[41]  W. Sharpe,et al.  Mean-Variance Analysis in Portfolio Choice and Capital Markets , 1987 .

[42]  M. J. Sobel The variance of discounted Markov decision processes , 1982, Journal of Applied Probability.

[43]  M. Rothschild,et al.  Increasing risk: I. A definition , 1970 .

[44]  G. J. Glasser Variance Formulas for the Mean Difference and Coefficient of Concentration , 1962 .

[45]  P. Poupart,et al.  Distributional Reinforcement Learning with Monotonic Splines , 2022, ICLR.

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[47]  Shlomo Yitzhaki,et al.  Gini’s Mean difference: a superior measure of variability for non-normal distributions , 2003 .

[48]  S. Kusuoka On law invariant coherent risk measures , 2001 .

[49]  G. Choquet Theory of capacities , 1954 .

[50]  C. Gini Variabilità e mutabilità : contributo allo studio delle distribuzioni e delle relazioni statistiche , 1912 .

[51]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .