Reinforcement Learning for Portfolio Management

In this thesis, we develop a comprehensive account of the expressive power, modelling efficiency, and performance advantages of so-called trading agents (i.e., Deep Soft Recurrent Q-Network (DSRQN) and Mixture of Score Machines (MSM)), based on both traditional system identification (model-based approach) as well as on context-independent agents (model-free approach). The analysis provides conclusive support for the ability of model-free reinforcement learning methods to act as universal trading agents, which are not only capable of reducing the computational and memory complexity (owing to their linear scaling with the size of the universe), but also serve as generalizing strategies across assets and markets, regardless of the trading universe on which they have been trained. The relatively low volume of daily returns in financial market data is addressed via data augmentation (a generative approach) and a choice of pre-training strategies, both of which are validated against current state-of-the-art models. For rigour, a risk-sensitive framework which includes transaction costs is considered, and its performance advantages are demonstrated in a variety of scenarios, from synthetic time-series (sinusoidal, sawtooth and chirp waves), simulated market series (surrogate data based), through to real market data (S\&P 500 and EURO STOXX 50). The analysis and simulations confirm the superiority of universal model-free reinforcement learning agents over current portfolio management model in asset allocation strategies, with the achieved performance advantage of as much as 9.2\% in annualized cumulative returns and 13.4\% in annualized Sharpe Ratio.

[1]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[4]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[5]  Roger E. A. Farmer,et al.  The Stock Market Crash of 2008 Caused the Great Recession: Theory and Evidence , 2011 .

[6]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[7]  Leon Cohen,et al.  Generalization of the Wiener-Khinchin theorem , 1998, IEEE Signal Processing Letters.

[8]  Luca Maria Gambardella,et al.  Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[9]  Il Memming Park,et al.  BLACK BOX VARIATIONAL INFERENCE FOR STATE SPACE MODELS , 2015, 1511.07367.

[10]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[11]  L Poole David,et al.  Artificial Intelligence: Foundations of Computational Agents , 2010 .

[12]  Thomas T. Hills,et al.  Exploration versus exploitation in space, mind, and society , 2015, Trends in Cognitive Sciences.

[13]  I. Tonks,et al.  THE LONDON STOCK EXCHANGE , 2016 .

[14]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[16]  A. Meucci Risk and asset allocation , 2005 .

[17]  Marcelo A. C. Fernandes,et al.  Beamforming and Power Control in Sensor Arrays Using Reinforcement Learning , 2015, Sensors.

[18]  B. LeBaron A builder's guide to agent-based financial markets , 2001 .

[19]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[20]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[21]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[22]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[23]  S Roberts,et al.  Gaussian processes for time-series modelling , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[24]  Kai Chen,et al.  A LSTM-based method for stock returns prediction: A case study of China stock market , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[25]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[26]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  W. Greene,et al.  计量经济分析 = Econometric analysis , 2009 .

[29]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[30]  Ruey S. Tsay,et al.  Analysis of Financial Time Series , 2005 .

[31]  Fang Wang,et al.  Signal Processing for Finance, Economics, and Marketing: Concepts, framework, and big data applications , 2017, IEEE Signal Processing Magazine.

[32]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  Daniel P. Palomar,et al.  A Signal Processing Perspective of Financial Engineering , 2016, Found. Trends Signal Process..

[35]  H. Akaike Markovian representation of stochastic processes and its application to the analysis of autoregressive moving average processes , 1974 .

[36]  William N. Goetzmann,et al.  Pairs Trading: Performance of a Relative Value Arbitrage Rule , 1998 .

[37]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[38]  Patrick Henaff,et al.  Real time implementation of CTRNN and BPTT algorithm to learn on-line biped robot balance: Experiments on the standing posture , 2020, ArXiv.

[39]  John Fearnley,et al.  Market Making via Reinforcement Learning , 2018, AAMAS.

[40]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[41]  Christophe Schinckus,et al.  An essay on financial information in the era of computerization , 2017, J. Inf. Technol..

[42]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[43]  Danilo P. Mandic,et al.  Augmented MVDR Spectrum-Based Frequency Estimation for Unbalanced Power Systems , 2013, IEEE Transactions on Instrumentation and Measurement.

[44]  Amir F. Atiya,et al.  An Empirical Comparison of Machine Learning Models for Time Series Forecasting , 2010 .

[45]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[46]  Youyong Kong,et al.  Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[48]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[49]  Shanghang Zhang,et al.  Integrating Learning and Planning , 2020 .

[50]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[53]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[54]  Petar M. Djuric,et al.  The Science Behind Risk Management , 2011, IEEE Signal Processing Magazine.

[55]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[56]  John Moody,et al.  Reinforcement Learning for Trading Systems and Portfolios: Immediate vs Future Rewards , 1998 .

[57]  Danilo P. Mandic,et al.  A generalized normalized gradient descent algorithm , 2004, IEEE Signal Processing Letters.

[58]  Shie Mannor,et al.  Bayesian Reinforcement Learning , 2012, Reinforcement Learning.

[59]  Paul Wilmott,et al.  Paul Wilmott Introduces Quantitative Finance , 2000 .

[60]  Peter Tiño,et al.  Financial volatility trading using recurrent neural networks , 2001, IEEE Trans. Neural Networks.

[61]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[62]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[63]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[64]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[65]  Zoubin Ghahramani,et al.  An Introduction to Hidden Markov Models and Bayesian Networks , 2001, Int. J. Pattern Recognit. Artif. Intell..

[66]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[67]  Matthew Fellows,et al.  Fourier Policy Gradients , 2018, ICML.

[68]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[69]  J. Cockcroft Investment in Science , 1962, Nature.

[70]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Roberto A. Monetti,et al.  Surrogates with random Fourier phases , 2008, 0812.2380.

[72]  Irene Aldridge,et al.  High-frequency Trading High-frequency Trading Industry Strategy Project Engineering Leadership Program , 2022 .