论文信息 - Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret, where $\alpha$ is the risk level, $d$ is the dimension of state-action features, $H$ is the length of each episode, and $K$ is the number of episodes. We also establish a matching lower bound $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$ to validate the optimality of ICVaR-L with respect to $d$ and $K$. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret, where $D$ is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.

[1] Quanquan Gu,et al. Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency , 2023, COLT.

[2] Wen Sun,et al. Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR , 2023, ICML.

[3] Xuefeng Gao,et al. Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents , 2023, ICML.

[4] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes , 2022, ICML.

[5] Alekh Agarwal,et al. VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation , 2022, ArXiv.

[6] Wei Xu,et al. Regret Bounds for Risk-Sensitive Reinforcement Learning , 2022, NeurIPS.

[7] Yihan Du,et al. Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path , 2022, ICLR.

[8] Lei Xu,et al. DeepTrader: A Deep Reinforcement Learning Approach for Risk-Return Balanced Portfolio Management with Market Conditions Embedding , 2021, AAAI.

[9] Dylan J. Foster,et al. Understanding the Eluder Dimension , 2021, NeurIPS.

[10] Zihan Zhang,et al. Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP , 2021, NeurIPS.

[11] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints , 2021, NeurIPS.

[12] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.

[13] Giuseppe De Pietro,et al. Reinforcement learning for intelligent healthcare applications: A survey , 2020, Artif. Intell. Medicine.

[14] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .

[15] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.

[16] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.

[17] Lin F. Yang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.

[18] Maria Grazia Speranza,et al. Conditional value-at-risk beyond finance: a survey , 2020, Int. Trans. Oper. Res..

[19] Jingliang Duan,et al. Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization* , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[20] Ambuj Tewari,et al. Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles , 2019, AISTATS.

[21] Silvestr Stanko,et al. Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach , 2019, IJCCI.

[22] Insoon Yang,et al. Risk-Aware Motion Planning and Control Using CVaR-Constrained Optimization , 2019, IEEE Robotics and Automation Letters.

[23] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.

[24] David Isele,et al. Safe Reinforcement Learning on Autonomous Vehicles , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[26] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[27] Huan Xu,et al. Approximate Value Iteration for Risk-Aware Markov Decision Processes , 2017, IEEE Transactions on Automatic Control.

[28] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[30] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.

[31] Yi Zhang,et al. Markov decision processes with iterated coherent risk measures , 2014, Int. J. Control.

[32] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[33] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[34] Jerzy A. Filar,et al. Time Consistent Dynamic Risk Measures , 2006, Math. Methods Oper. Res..

[35] K. H. Low,et al. Risk-Aware Reinforcement Learning with Coherent Risk Measures and Non-linear Function Approximation , 2023, ICLR.

[36] Zhuoran Yang,et al. Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach , 2021, ICML.

[37] Jonathan Theodor Ott,et al. A Markov Decision Model for a Surveillance Application and Risk-Sensitive Markov Decision Processes , 2010 .

[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .

[40] J. Hull. Options, Futures, and Other Derivatives , 1989 .