A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem

Financial portfolio management is the process of constant redistribution of a fund into different financial products. This paper presents a financial-model-free Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem. The framework consists of the Ensemble of Identical Independent Evaluators (EIIE) topology, a Portfolio-Vector Memory (PVM), an Online Stochastic Batch Learning (OSBL) scheme, and a fully exploiting and explicit reward function. This framework is realized in three instants in this work with a Convolutional Neural Network (CNN), a basic Recurrent Neural Network (RNN), and a Long Short-Term Memory (LSTM). They are, along with a number of recently reviewed or published portfolio-selection strategies, examined in three back-test experiments with a trading period of 30 minutes in a cryptocurrency market. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. All three instances of the framework monopolize the top three positions in all experiments, outdistancing other compared trading algorithms. Although with a high commission rate of 0.25% in the backtests, the framework is able to achieve at least 4-fold returns in 50 days.

[1]  R. Haugen Modern investment theory , 1986 .

[2]  Bin Li,et al.  CORN: Correlation-driven nonparametric learning approach for portfolio selection , 2011, TIST.

[3]  W. Rudin Principles of mathematical analysis , 1964 .

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Alberto Ferreira de Souza,et al.  Prediction-based portfolio optimization model using neural networks , 2009, Neurocomputing.

[6]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[7]  Thomas M. Cover,et al.  Universal data compression and portfolio selection , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[8]  J. Moody,et al.  Performance functions and reinforcement learning for trading systems and portfolios , 1998 .

[9]  M A H Dempster,et al.  An automated FX trading system using adaptive reinforcement learning , 2006, Expert Syst. Appl..

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Louis Leithold The Calculus 7 , 1995 .

[12]  András Urbán,et al.  Performance analysis of log-optimal portfolio strategies with transaction costs , 2011 .

[13]  Jan Hendrik Witte,et al.  Deep Learning for Finance: Deep Portfolios , 2016 .

[14]  W. Sharpe The Sharpe Ratio , 1994 .

[15]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[16]  Yoram Singer,et al.  On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[17]  Steven C. H. Hoi,et al.  Online portfolio selection: A survey , 2012, CSUR.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Steven C. H. Hoi,et al.  PAMR: Passive aggressive mean reversion strategy for portfolio selection , 2012, Machine Learning.

[20]  Allan Borodin,et al.  On the Competitive Theory and Practice of Portfolio Selection (Extended Abstract) , 2000, LATIN.

[21]  Bin Li,et al.  Robust Median Reversion Strategy for Online Portfolio Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22]  Allan Borodin,et al.  Can We Learn to Beat the Best Stock , 2003, NIPS.

[23]  Seyed Taghi Akhavan Niaki,et al.  Forecasting S&P 500 index using artificial neural networks and design of experiments , 2013 .

[24]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[25]  Bin Li,et al.  Moving average reversion strategy for on-line portfolio selection , 2015, Artif. Intell..

[26]  W. Sharpe CAPITAL ASSET PRICES: A THEORY OF MARKET EQUILIBRIUM UNDER CONDITIONS OF RISK* , 1964 .

[27]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[28]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[29]  Robert E. Schapire,et al.  Algorithms for portfolio management based on the Newton method , 2006, ICML.

[30]  Youyong Kong,et al.  Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Vladimir Vovk,et al.  Universal portfolio selection , 1998, COLT' 98.

[32]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[33]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[34]  T. Cover Universal Portfolios , 1996 .

[35]  Jeremy Clark,et al.  SoK: Research Perspectives and Challenges for Bitcoin and Cryptocurrencies , 2015, 2015 IEEE Symposium on Security and Privacy.

[36]  Zhengyao Jiang,et al.  Cryptocurrency portfolio management with deep reinforcement learning , 2016, 2017 Intelligent Systems Conference (IntelliSys).

[37]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[38]  Arindam Banerjee,et al.  Meta optimization and its application to portfolio selection , 2011, KDD.

[39]  Evan Duffield,et al.  Darkcoin : Peer to Peer Crypto Currency with Anonymous Blockchain Transactions and an Improved Proof of Work System , .

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Weiguo Zhang,et al.  Weighted Moving Average Passive Aggressive Algorithm for Online Portfolio Selection , 2013, 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics.

[42]  Reuben Grinberg Bitcoin: An Innovative Alternative Digital Currency , 2011 .

[43]  C. Kirkpatrick,et al.  Technical Analysis: The Complete Resource for Financial Market Technicians , 2006 .

[44]  L. Rogers,et al.  Estimating Variance From High, Low and Closing Prices , 1991 .

[45]  Iddo Bentov,et al.  Proof of Activity: Extending Bitcoin's Proof of Work via Proof of Stake [Extended Abstract]y , 2014, PERV.

[46]  Jonas Schmitt Portfolio Selection Efficient Diversification Of Investments , 2016 .

[47]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[48]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[49]  Bin Li,et al.  OLPS: A Toolbox for On-Line Portfolio Selection , 2016, J. Mach. Learn. Res..

[50]  G. Lugosi,et al.  NONPARAMETRIC KERNEL‐BASED SEQUENTIAL INVESTMENT STRATEGIES , 2006 .

[51]  R. Leal,et al.  Maximum Drawdown , 2005 .

[52]  Bin Li,et al.  Confidence Weighted Mean Reversion Strategy for Online Portfolio Selection , 2011, TKDD.