Automatic Risk Adaptation in Distributional Reinforcement Learning

The use of Reinforcement Learning (RL) agents in practical applications requires the consideration of suboptimal outcomes, depending on the familiarity of the agent with its environment. This is especially important in safety-critical environments, where errors can lead to high costs or damage. In distributional RL, the risksensitivity can be controlled via different distortion measures of the estimated return distribution. However, these distortion functions require an estimate of the risk level, which is difficult to obtain and depends on the current state. In this work, we demonstrate the suboptimality of a static risk level estimation and propose a method to dynamically select risk levels at each environment step. Our method ARA (Automatic Risk Adaptation) estimates the appropriate risk level in both known and unknown environments using a Random Network Distillation error. We show reduced failure rates by up to a factor of 7 and improved generalization performance by up to 14% compared to both risk-aware and risk-agnostic agents in several locomotion environments.

[1]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[2]  Alejandro Balbás,et al.  Properties of Distortion Risk Measures , 2009 .

[3]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[4]  Christoph Dann,et al.  Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy , 2020, AAAI.

[5]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[6]  Yuchen Cui,et al.  Risk-Aware Active Inverse Reinforcement Learning , 2018, CoRL.

[7]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[8]  Christopher Amato,et al.  Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning , 2018, AAMAS.

[9]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[10]  Rahul Singh,et al.  Improving Robustness via Risk Averse Distributional Reinforcement Learning , 2020, L4DC.

[11]  Andre Rosendo,et al.  Risk-Aware Model-Based Control , 2021, Frontiers in Robotics and AI.

[12]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[13]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[14]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[15]  Sebastian Junges,et al.  Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper) , 2020, CONCUR.

[16]  Li Xia,et al.  DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning , 2020 .

[17]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[18]  Alekh Agarwal,et al.  Safe Reinforcement Learning via Curriculum Induction , 2020, NeurIPS.

[19]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[20]  Jinyoung Choi,et al.  Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Jorge Pena Queralta,et al.  Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey , 2020, 2020 IEEE Symposium Series on Computational Intelligence (SSCI).