论文信息 - Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Dynamic portfolio optimization is the process of sequentially allocating wealth to a collection of assets in some consecutive trading periods, based on investors' return-risk profile. Automating this process with machine learning remains a challenging problem. Here, we design a deep reinforcement learning (RL) architecture with an autonomous trading agent such that, investment decisions and actions are made periodically, based on a global objective, with autonomy. In particular, without relying on a purely model-free RL agent, we train our trading agent using a novel RL architecture consisting of an infused prediction module (IPM), a generative adversarial data augmentation module (DAM) and a behavior cloning module (BCM). Our model-based approach works with both on-policy or off-policy RL algorithms. We further design the back-testing and execution engine which interact with the RL agent in real time. Using historical {\em real} financial market data, we simulate trading with practical constraints, and demonstrate that our proposed model is robust, profitable and risk-sensitive, as compared to baseline trading strategies and model-free RL agents from prior work.

[1] Zhengyao Jiang,et al. A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem , 2017, ArXiv.

[2] Jack L. Treynor,et al. MUTUAL FUND PERFORMANCE* , 2007 .

[3] Léon Bottou,et al. Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[4] Harald Haas,et al. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[5] Gunnar Rätsch,et al. Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[6] S. Uryasev,et al. Drawdown Measure in Portfolio Optimization , 2003 .

[7] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[10] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[11] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[12] Steven C. H. Hoi,et al. Online portfolio selection: A survey , 2012, CSUR.

[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] F. Sortino,et al. Performance Measurement in a Downside Risk Framework , 1994 .

[16] Sander Bohte,et al. Conditional Time Series Forecasting with Convolutional Neural Networks , 2017, 1703.04691.

[17] Youyong Kong,et al. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[19] M A H Dempster,et al. An automated FX trading system using adaptive reinforcement learning , 2006, Expert Syst. Appl..

[20] Bernhard Schölkopf,et al. A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[21] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[22] Yanran Li,et al. Adversarial Deep Reinforcement Learning in Portfolio Management , 2018 .

[23] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[25] Herbert Jaeger,et al. Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[26] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..

[27] Kamran Usmani. An Investigation into the Use of Reinforcement Learning Techniques within the Algorithmic Trading Domain , 2015 .

[28] R. Haugen. Modern investment theory , 1986 .

[29] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[30] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[31] Richard S. Zemel,et al. Generative Moment Matching Networks , 2015, ICML.

[32] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[33] Siobhán Clarke,et al. Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments , 2017, ACM Trans. Auton. Adapt. Syst..

[34] Jonas Schmitt. Portfolio Selection Efficient Diversification Of Investments , 2016 .

[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[36] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[37] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[38] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[39] Daniel Pérez Palomar,et al. Robust Optimization of Order Execution , 2015, IEEE Transactions on Signal Processing.

[40] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[41] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .

[42] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .

[43] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[44] Roni Mittelman,et al. Time-series modeling with undecimated fully convolutional neural networks , 2015, ArXiv.

[45] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[46] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[47] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[48] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[49] Takayuki Osogami,et al. Nonlinear Dynamic Boltzmann Machines for Time-Series Prediction , 2017, AAAI.

[50] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[51] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.

[52] T. Cover. Universal Portfolios , 1996 .

[53] Xingyu Fu,et al. Robust Log-Optimal Strategy with Reinforcement Learning , 2018 .