Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret, where $\alpha$ is the risk level, $d$ is the dimension of state-action features, $H$ is the length of each episode, and $K$ is the number of episodes. We also establish a matching lower bound $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$ to validate the optimality of ICVaR-L with respect to $d$ and $K$. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret, where $D$ is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.

[1]  Quanquan Gu,et al.  Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency , 2023, COLT.

[2]  Wen Sun,et al.  Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR , 2023, ICML.

[3]  Xuefeng Gao,et al.  Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents , 2023, ICML.

[4]  Quanquan Gu,et al.  Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes , 2022, ICML.

[5]  Alekh Agarwal,et al.  VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation , 2022, ArXiv.

[6]  Wei Xu,et al.  Regret Bounds for Risk-Sensitive Reinforcement Learning , 2022, NeurIPS.

[7]  Yihan Du,et al.  Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path , 2022, ICLR.

[8]  Lei Xu,et al.  DeepTrader: A Deep Reinforcement Learning Approach for Risk-Return Balanced Portfolio Management with Market Conditions Embedding , 2021, AAAI.

[9]  Dylan J. Foster,et al.  Understanding the Eluder Dimension , 2021, NeurIPS.

[10]  Zihan Zhang,et al.  Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP , 2021, NeurIPS.

[11]  Michael I. Jordan,et al.  Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints , 2021, NeurIPS.

[12]  Quanquan Gu,et al.  Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.

[13]  Giuseppe De Pietro,et al.  Reinforcement learning for intelligent healthcare applications: A survey , 2020, Artif. Intell. Medicine.

[14]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[15]  Quanquan Gu,et al.  Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.

[16]  Mengdi Wang,et al.  Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.

[17]  Lin F. Yang,et al.  Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.

[18]  Maria Grazia Speranza,et al.  Conditional value-at-risk beyond finance: a survey , 2020, Int. Trans. Oper. Res..

[19]  Jingliang Duan,et al.  Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization* , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[20]  Ambuj Tewari,et al.  Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles , 2019, AISTATS.

[21]  Silvestr Stanko,et al.  Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach , 2019, IJCCI.

[22]  Insoon Yang,et al.  Risk-Aware Motion Planning and Control Using CVaR-Constrained Optimization , 2019, IEEE Robotics and Automation Letters.

[23]  Mengdi Wang,et al.  Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.

[24]  David Isele,et al.  Safe Reinforcement Learning on Autonomous Vehicles , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[26]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[27]  Huan Xu,et al.  Approximate Value Iteration for Risk-Aware Markov Decision Processes , 2017, IEEE Transactions on Automatic Control.

[28]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  Benjamin Van Roy,et al.  Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.

[31]  Yi Zhang,et al.  Markov decision processes with iterated coherent risk measures , 2014, Int. J. Control.

[32]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[33]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[34]  Jerzy A. Filar,et al.  Time Consistent Dynamic Risk Measures , 2006, Math. Methods Oper. Res..

[35]  K. H. Low,et al.  Risk-Aware Reinforcement Learning with Coherent Risk Measures and Non-linear Function Approximation , 2023, ICLR.

[36]  Zhuoran Yang,et al.  Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach , 2021, ICML.

[37]  Jonathan Theodor Ott,et al.  A Markov Decision Model for a Surveillance Application and Risk-Sensitive Markov Decision Processes , 2010 .

[38]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[40]  J. Hull Options, Futures, and Other Derivatives , 1989 .