论文信息 - A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem

A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem

Financial portfolio management is the process of constant redistribution of a fund into different financial products. This paper presents a financial-model-free Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem. The framework consists of the Ensemble of Identical Independent Evaluators (EIIE) topology, a Portfolio-Vector Memory (PVM), an Online Stochastic Batch Learning (OSBL) scheme, and a fully exploiting and explicit reward function. This framework is realized in three instants in this work with a Convolutional Neural Network (CNN), a basic Recurrent Neural Network (RNN), and a Long Short-Term Memory (LSTM). They are, along with a number of recently reviewed or published portfolio-selection strategies, examined in three back-test experiments with a trading period of 30 minutes in a cryptocurrency market. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. All three instances of the framework monopolize the top three positions in all experiments, outdistancing other compared trading algorithms. Although with a high commission rate of 0.25% in the backtests, the framework is able to achieve at least 4-fold returns in 50 days.

[1] R. Haugen. Modern investment theory , 1986 .

[2] Bin Li,et al. CORN: Correlation-driven nonparametric learning approach for portfolio selection , 2011, TIST.

[3] W. Rudin. Principles of mathematical analysis , 1964 .

[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[5] Alberto Ferreira de Souza,et al. Prediction-based portfolio optimization model using neural networks , 2009, Neurocomputing.

[6] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[7] Thomas M. Cover,et al. Universal data compression and portfolio selection , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[8] J. Moody,et al. Performance functions and reinforcement learning for trading systems and portfolios , 1998 .

[9] M A H Dempster,et al. An automated FX trading system using adaptive reinforcement learning , 2006, Expert Syst. Appl..

[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11] Louis Leithold. The Calculus 7 , 1995 .

[12] András Urbán,et al. Performance analysis of log-optimal portfolio strategies with transaction costs , 2011 .

[13] Jan Hendrik Witte,et al. Deep Learning for Finance: Deep Portfolios , 2016 .

[14] W. Sharpe. The Sharpe Ratio , 1994 .

[15] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[16] Yoram Singer,et al. On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[17] Steven C. H. Hoi,et al. Online portfolio selection: A survey , 2012, CSUR.

[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[19] Steven C. H. Hoi,et al. PAMR: Passive aggressive mean reversion strategy for portfolio selection , 2012, Machine Learning.

[20] Allan Borodin,et al. On the Competitive Theory and Practice of Portfolio Selection (Extended Abstract) , 2000, LATIN.

[21] Bin Li,et al. Robust Median Reversion Strategy for Online Portfolio Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22] Allan Borodin,et al. Can We Learn to Beat the Best Stock , 2003, NIPS.

[23] Seyed Taghi Akhavan Niaki,et al. Forecasting S&P 500 index using artificial neural networks and design of experiments , 2013 .

[24] John L. Kelly,et al. A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[25] Bin Li,et al. Moving average reversion strategy for on-line portfolio selection , 2015, Artif. Intell..

[26] W. Sharpe. CAPITAL ASSET PRICES: A THEORY OF MARKET EQUILIBRIUM UNDER CONDITIONS OF RISK* , 1964 .

[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[28] C. Holt. Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[29] Robert E. Schapire,et al. Algorithms for portfolio management based on the Newton method , 2006, ICML.

[30] Youyong Kong,et al. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[31] Vladimir Vovk,et al. Universal portfolio selection , 1998, COLT' 98.

[32] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[33] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[34] T. Cover. Universal Portfolios , 1996 .

[35] Jeremy Clark,et al. SoK: Research Perspectives and Challenges for Bitcoin and Cryptocurrencies , 2015, 2015 IEEE Symposium on Security and Privacy.

[36] Zhengyao Jiang,et al. Cryptocurrency portfolio management with deep reinforcement learning , 2016, 2017 Intelligent Systems Conference (IntelliSys).

[37] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[38] Arindam Banerjee,et al. Meta optimization and its application to portfolio selection , 2011, KDD.

[39] Evan Duffield,et al. Darkcoin : Peer to Peer Crypto Currency with Anonymous Blockchain Transactions and an Improved Proof of Work System , .

[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41] Weiguo Zhang,et al. Weighted Moving Average Passive Aggressive Algorithm for Online Portfolio Selection , 2013, 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics.

[42] Reuben Grinberg. Bitcoin: An Innovative Alternative Digital Currency , 2011 .

[43] C. Kirkpatrick,et al. Technical Analysis: The Complete Resource for Financial Market Technicians , 2006 .

[44] L. Rogers,et al. Estimating Variance From High, Low and Closing Prices , 1991 .

[45] Iddo Bentov,et al. Proof of Activity: Extending Bitcoin's Proof of Work via Proof of Stake [Extended Abstract]y , 2014, PERV.

[46] Jonas Schmitt. Portfolio Selection Efficient Diversification Of Investments , 2016 .

[47] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[48] Yann LeCun,et al. Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[49] Bin Li,et al. OLPS: A Toolbox for On-Line Portfolio Selection , 2016, J. Mach. Learn. Res..

[50] G. Lugosi,et al. NONPARAMETRIC KERNEL‐BASED SEQUENTIAL INVESTMENT STRATEGIES , 2006 .

[51] R. Leal,et al. Maximum Drawdown , 2005 .

[52] Bin Li,et al. Confidence Weighted Mean Reversion Strategy for Online Portfolio Selection , 2011, TKDD.