Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Dynamic portfolio optimization is the process of sequentially allocating wealth to a collection of assets in some consecutive trading periods, based on investors' return-risk profile. Automating this process with machine learning remains a challenging problem. Here, we design a deep reinforcement learning (RL) architecture with an autonomous trading agent such that, investment decisions and actions are made periodically, based on a global objective, with autonomy. In particular, without relying on a purely model-free RL agent, we train our trading agent using a novel RL architecture consisting of an infused prediction module (IPM), a generative adversarial data augmentation module (DAM) and a behavior cloning module (BCM). Our model-based approach works with both on-policy or off-policy RL algorithms. We further design the back-testing and execution engine which interact with the RL agent in real time. Using historical {\em real} financial market data, we simulate trading with practical constraints, and demonstrate that our proposed model is robust, profitable and risk-sensitive, as compared to baseline trading strategies and model-free RL agents from prior work.

[1]  Zhengyao Jiang,et al.  A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem , 2017, ArXiv.

[2]  Jack L. Treynor,et al.  MUTUAL FUND PERFORMANCE* , 2007 .

[3]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[4]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[5]  Gunnar Rätsch,et al.  Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[6]  S. Uryasev,et al.  Drawdown Measure in Portfolio Optimization , 2003 .

[7]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[10]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[11]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[12]  Steven C. H. Hoi,et al.  Online portfolio selection: A survey , 2012, CSUR.

[13]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  F. Sortino,et al.  Performance Measurement in a Downside Risk Framework , 1994 .

[16]  Sander Bohte,et al.  Conditional Time Series Forecasting with Convolutional Neural Networks , 2017, 1703.04691.

[17]  Youyong Kong,et al.  Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[19]  M A H Dempster,et al.  An automated FX trading system using adaptive reinforcement learning , 2006, Expert Syst. Appl..

[20]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Yanran Li,et al.  Adversarial Deep Reinforcement Learning in Portfolio Management , 2018 .

[23]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[26]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[27]  Kamran Usmani An Investigation into the Use of Reinforcement Learning Techniques within the Algorithmic Trading Domain , 2015 .

[28]  R. Haugen Modern investment theory , 1986 .

[29]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[30]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[31]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[32]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[33]  Siobhán Clarke,et al.  Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments , 2017, ACM Trans. Auton. Adapt. Syst..

[34]  Jonas Schmitt Portfolio Selection Efficient Diversification Of Investments , 2016 .

[35]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[36]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[37]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[39]  Daniel Pérez Palomar,et al.  Robust Optimization of Order Execution , 2015, IEEE Transactions on Signal Processing.

[40]  Byron Boots,et al.  Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[41]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[42]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[43]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[44]  Roni Mittelman,et al.  Time-series modeling with undecimated fully convolutional neural networks , 2015, ArXiv.

[45]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[46]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[47]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[48]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[49]  Takayuki Osogami,et al.  Nonlinear Dynamic Boltzmann Machines for Time-Series Prediction , 2017, AAAI.

[50]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[51]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[52]  T. Cover Universal Portfolios , 1996 .

[53]  Xingyu Fu,et al.  Robust Log-Optimal Strategy with Reinforcement Learning , 2018 .