暂无分享,去创建一个
[1] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[2] E. Altman. Constrained Markov Decision Processes , 1999 .
[3] Richard S. Sutton,et al. Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods , 2018 .
[4] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[5] Jon Danielsson,et al. Consistent measures of risk , 2006 .
[6] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .
[7] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[8] A. Tversky,et al. Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .
[9] H. Robbins. A Stochastic Approximation Method , 1951 .
[10] Matteo Hessel,et al. General non-linear Bellman equations , 2019, ArXiv.
[11] R. C. Merton,et al. Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case , 1969 .
[12] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.
[13] Marcello Restelli,et al. Risk-Averse Trust Region Optimization for Reward-Volatility Reduction , 2019, IJCAI.
[14] Shalabh Bhatnagar,et al. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..
[15] M. Lebreton,et al. Behavioural and neural characterization of optimistic reinforcement learning , 2017, Nature Human Behaviour.
[16] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.
[17] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[18] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[19] Shie Mannor,et al. Learning the Variance of the Reward-To-Go , 2016, J. Mach. Learn. Res..
[20] A. Tversky,et al. Advances in prospect theory: Cumulative representation of uncertainty , 1992 .
[21] F. Sortino,et al. Performance Measurement in a Downside Risk Framework , 1994 .
[22] Karl Tuyls,et al. Robust temporal difference learning for critical domains , 2019, AAMAS.
[23] John N. Tsitsiklis,et al. Algorithmic aspects of mean-variance optimization in Markov decision processes , 2013, Eur. J. Oper. Res..
[24] A. Tversky,et al. Prospect theory: an analysis of decision under risk — Source link , 2007 .
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Giovanni Walter Puopolo. Portfolio selection with transaction costs and default risk , 2017 .
[27] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[28] S. M. Sunoj,et al. Some properties of conditional partial moments in the context of stochastic modelling , 2019 .
[29] M. Ma,et al. FOUNDATIONS OF PORTFOLIO THEORY , 1990 .
[30] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[31] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..
[32] Simone Farinelli,et al. Sharpe thinking in asset ranking with one-sided measures , 2008, Eur. J. Oper. Res..
[33] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.
[34] Rahul Savani,et al. Robust Market Making via Adversarial Reinforcement Learning , 2020, AAMAS.
[35] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[36] Qihe Tang,et al. Solvency capital, risk measures and comonotonicity: a review , 2004 .
[37] Alex Weissensteiner,et al. A $Q$ -Learning Approach to Derive Optimal Consumption and Investment Strategies , 2008, IEEE Transactions on Neural Networks.
[38] A. Young. Prospect Theory: An Analysis of Decision Under Risk (Kahneman and Tversky, 1979) , 2011 .
[39] Martha White,et al. Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods , 2018, ArXiv.
[40] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[41] P. Fishburn. Mean-Risk Analysis with Risk Associated with Below-Target Returns , 1977 .
[42] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.
[43] Shie Mannor,et al. Policy Gradient for Coherent Risk Measures , 2015, NIPS.