暂无分享,去创建一个
[1] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[2] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[3] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[4] Christoph Dann,et al. Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy , 2020, AAAI.
[5] T. Coleman,et al. Minimizing CVaR and VaR for a portfolio of derivatives , 2006 .
[6] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[7] Kamyar Azizzadenesheli,et al. Policy Gradient in Partially Observable Environments: Approximation and Convergence , 2018 .
[8] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..
[9] Zhaoran Wang,et al. Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret , 2020, NeurIPS.
[10] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.
[11] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[12] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[13] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.
[14] Yaoliang Yu,et al. Distributional Reinforcement Learning for Efficient Exploration , 2019, ICML.
[15] Shie Mannor,et al. Policy Gradient for Coherent Risk Measures , 2015, NIPS.
[16] Alexander Shapiro,et al. Optimization of Convex Risk Functions , 2006, Math. Oper. Res..
[17] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[18] Tanner Fiez,et al. Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation , 2020, ArXiv.
[19] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[20] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.
[21] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[22] Abaxbank,et al. Spectral Measures of Risk : a Coherent Representation of Subjective Risk Aversion , 2002 .
[23] Georg Ch. Pflug,et al. Time-Consistent Decisions and Temporal Decomposition of Coherent Risk Functionals , 2016, Math. Oper. Res..
[24] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[25] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[26] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[27] Stephen Boyd,et al. A Rewriting System for Convex Optimization Problems , 2017, ArXiv.
[28] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[29] Yao Liu,et al. Off-Policy Policy Gradient with Stationary Distribution Correction , 2019, UAI.
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.