暂无分享,去创建一个
[1] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[2] E. Fama,et al. The Cross‐Section of Expected Stock Returns , 1992 .
[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[4] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[6] R. Leal,et al. Maximum Drawdown , 2005 .
[7] Bin Wang,et al. The Kelly Growth Optimal Portfolio with Ensemble Learning , 2019, AAAI.
[8] Shimon Whiteson,et al. Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning , 2020, AAAI.
[9] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[10] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[11] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[12] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[13] Michael W. Brandt. Portfolio Choice Problems , 2010 .
[14] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[15] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[16] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[17] Jun Wang,et al. Portfolio Blending via Thompson Sampling , 2016, IJCAI.
[18] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[19] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[20] J. Lewellen. The Cross Section of Expected Stock Returns , 2014 .
[21] Mohammad Ghavamzadeh,et al. Variance-constrained actor-critic algorithms for discounted and average reward MDPs , 2014, Machine Learning.
[22] J. Hull. Options, Futures, and Other Derivatives , 1989 .
[23] Victor DeMiguel,et al. Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy? , 2009 .
[24] W. Sharpe,et al. Mean-Variance Analysis in Portfolio Choice and Capital Markets , 1987 .
[25] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[26] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[27] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[29] Youyong Kong,et al. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[30] Marcello Restelli,et al. Risk-Averse Trust Region Optimization for Reward-Volatility Reduction , 2019, IJCAI.
[31] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[32] C. Patel. Optimal versus Naive Diversification: How Inefficient Is the 1/N Portfolio Strategy? , 2009 .
[33] Bo Liu,et al. A Block Coordinate Ascent Algorithm for Mean-Variance Optimization , 2018, NeurIPS.
[34] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.