论文信息 - Adversarial Deep Reinforcement Learning in Portfolio Management - 字舞流文

Adversarial Deep Reinforcement Learning in Portfolio Management

In this paper, we implement three state-of-art continuous reinforcement learning algorithms, Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO) and Policy Gradient (PG)in portfolio management. All of them are widely-used in game playing and robot control. What's more, PPO has appealing theoretical propeties which is hopefully potential in portfolio management. We present the performances of them under different settings, including different learning rates, objective functions, feature combinations, in order to provide insights for parameters tuning, features selection and data preparation. We also conduct intensive experiments in China Stock market and show that PG is more desirable in financial market than DDPG and PPO, although both of them are more advanced. What's more, we propose a so called Adversarial Training method and show that it can greatly improve the training efficiency and significantly promote average daily return and sharpe ratio in back test. Based on this new modification, our experiments results show that our agent based on Policy Gradient can outperform UCRP.

Yanran Li | Zhipeng Liang | Kangkang Jiang | Hao Chen | Junhao Zhu | Junhao Zhu | Zhipeng Liang | Kangkang Jiang | Hao Chen | Yanran Li

[1] Xin Du,et al. Algorithm Trading using Q-Learning and Recurrent Reinforcement Learning , 2022 .

[2] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[3] Joelle Pineau,et al. The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach , 2018, J. Artif. Intell. Res..

[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Steve Y. Yang,et al. An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown , 2017, Expert Syst. Appl..

[6] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7] L. Tang. An actor-critic-based portfolio investment method inspired by benefit-risk optimization , 2018, Journal of Algorithms & Computational Technology.

[8] António Rua,et al. International comovement of stock market returns: a wavelet analysis , 2009 .

[9] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.

[10] Girish Chowdhary,et al. Robust Deep Reinforcement Learning with Adversarial Attacks , 2017, AAMAS.

[11] Steve Y. Yang,et al. An investor sentiment reward-based trading system using Gaussian inverse reinforcement learning algorithm , 2018, Expert Syst. Appl..

[12] Zhengyao Jiang,et al. A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem , 2017, ArXiv.

[13] J. Teichmann,et al. Deep hedging , 2018, Quantitative Finance.

[14] David W. Lu,et al. Agent Inspired Trading Using Recurrent Reinforcement Learning and LSTM Neural Networks , 2017, 1707.07338.

[15] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16] Ramsey Michael Faragher,et al. Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation [Lecture Notes] , 2012, IEEE Signal Processing Magazine.

[17] Lai-Wan Chan,et al. An Algorithm for Trading and Portfolio Management Using Q-learning and Sharpe Ratio Maximization , 2000 .

[18] Daniel Dewey,et al. Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.

[19] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[21] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[22] Steven C. H. Hoi,et al. Online portfolio selection: A survey , 2012, CSUR.

[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[24] B. Jacobsen,et al. Volatility Clustering in Monthly Stock Returns , 2003 .

[25] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[26] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[27] Anil V. Rao,et al. ( Preprint ) AAS 09-334 A SURVEY OF NUMERICAL METHODS FOR OPTIMAL CONTROL , 2009 .

[28] Xingyu Fu,et al. Robust Log-Optimal Strategy with Reinforcement Learning , 2018 .