Model-based reinforcement learning with non-Gaussian environment dynamics and its application to portfolio optimization.

With the fast development of quantitative portfolio optimization in financial engineering, lots of AI-based algorithmic trading strategies have demonstrated promising results, among which reinforcement learning begins to manifest competitive advantages. However, the environment from real financial markets is complex and hard to be fully simulated, considering the observation of abrupt transitions, unpredictable hidden causal factors, heavy tail properties and so on. Thus, in this paper, first, we adopt a heavy-tailed preserving normalizing flows to simulate high-dimensional joint probability of the complex trading environment and develop a model-based reinforcement learning framework to better understand the intrinsic mechanisms of quantitative online trading. Second, we experiment with various stocks from three different financial markets (Dow, NASDAQ and S&P) and show that among these three financial markets, Dow gets the best performance based on various evaluation metrics under our back-testing system. Especially, our proposed method is able to mitigate the impact of unpredictable financial market crises during the COVID-19 pandemic period, resulting in a lower maximum drawdown. Third, we also explore the explanation of our RL algorithm. (1), we utilize the pattern causality method to study the interactive relation among different stocks in the environment. (2), We analyze the dynamic loss and actor loss to ensure the convergence of our strategies. (3), by visualizing high dimensional state transition data comparisons from real and virtual buffers with t-SNE, we uncover some effective patterns of better portfolio optimization strategies. (4), we also utilize eigenvalue analysis to study the convergence properties of the environmen's model.

[1]  Jian Li,et al.  Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability , 2022, NeurIPS.

[2]  Jinqiao Duan,et al.  An end-to-end deep learning approach for extracting stochastic dynamical systems with α-stable Lévy noise , 2022, Chaos.

[3]  I. Kevrekidis,et al.  Learning the temporal evolution of multivariate densities via normalizing flows , 2021, Chaos.

[4]  Yaoyu Zhang,et al.  Embedding Principle of Loss Landscape of Deep Neural Networks , 2021, NeurIPS.

[5]  Roberto Calandra,et al.  MBRL-Lib: A Modular Library for Model-based Reinforcement Learning , 2021, ArXiv.

[6]  Michael W. Mahoney,et al.  Hessian Eigenspectra of More Realistic Nonlinear Models , 2021, NeurIPS.

[7]  Vyas Sekar,et al.  Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions , 2021, ICML.

[8]  Qing He,et al.  Trust the Model When It Is Confident: Masked Model-based Actor-Critic , 2020, NeurIPS.

[9]  Xiao-Yang Liu,et al.  Deep reinforcement learning for automated stock trading: an ensemble strategy , 2020, ICAIF.

[10]  Yong Yu,et al.  Bidirectional Model-based Policy Optimization , 2020, ICML.

[11]  Kurt Keutzer,et al.  ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning , 2020, AAAI.

[12]  Ingmar Schuster,et al.  Multi-variate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows , 2020, ICLR.

[13]  T. Tony Cai,et al.  Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms , 2020, ArXiv.

[14]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[15]  Arthur Jacot,et al.  The asymptotic spectrum of the Hessian of DNN throughout training , 2019, ICLR.

[16]  Yaoliang Yu,et al.  Tails of Lipschitz Triangular Flows , 2019, ICML.

[17]  Yinchuan Li,et al.  Optimistic Bull or Pessimistic Bear: Adaptive Deep Reinforcement Learning for Stock Portfolio Allocation , 2019, 1907.01503.

[18]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[19]  Athanasios A. Pantelous,et al.  Hidden interactions in financial markets , 2018, Proceedings of the National Academy of Sciences.

[20]  Victor I. Chang,et al.  An Empirical Research on the Investment Strategy of Stock Market based on Deep Reinforcement Learning model , 2019, COMPLEXIS.

[21]  Kurt Keutzer,et al.  HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Sutta Sornmayura,et al.  Robust FOREX Trading with Deep Q Network (DQN) , 2019 .

[23]  Jinho Lee,et al.  Global Stock Market Prediction Based on Stock Chart Images Using Deep Q-Network , 2019, IEEE Access.

[24]  Pengqian Yu,et al.  Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization , 2019, ArXiv.

[25]  Xiao-Yang Liu,et al.  Practical Deep Reinforcement Learning Approach for Stock Trading , 2018, ArXiv.

[26]  Qinma Kang,et al.  An Asynchronous Advantage Actor-Critic Reinforcement Learning Method for Stock Selection and Portfolio Management , 2018, ICBDR.

[27]  Ikhlaas Gurrib “ Performance of the Average Directional Index as a market timing tool for the most actively traded USD based currency pairs ” , 2019 .

[28]  Honglak Lee,et al.  Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.

[29]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[30]  Kam-Fai Wong,et al.  Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[31]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[32]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[33]  Yulei Rao,et al.  A deep learning framework for financial time series using stacked autoencoders and long-short term memory , 2017, PloS one.

[34]  Shakir Mohamed,et al.  Normalizing Flows on Riemannian Manifolds , 2016, ArXiv.

[35]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[36]  Yuandong Tian,et al.  Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[37]  Yann LeCun,et al.  Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.

[38]  Dongmei Zhao,et al.  Opportunistic scheduling for a two-way relay network using Markov decision process , 2016, IET Commun..

[39]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[40]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[41]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[42]  K. Šrédl,et al.  Commodity Channel Index: Evaluation of Trading Rule of Agricultural Commodities , 2016 .

[43]  Eleni Vasilaki,et al.  Modelling stock-market investors as Reinforcement Learning agents , 2015, 2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[44]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[45]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[46]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[47]  Jinqiao Duan An Introduction to Stochastic Dynamics , 2015 .

[48]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[49]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[50]  Aderemi Oluyinka Adewumi,et al.  Stock Price Prediction Using the ARIMA Model , 2014, 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation.

[51]  Terence Tai Leung Chong,et al.  Revisiting the Performance of MACD and RSI Oscillators , 2014 .

[52]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[53]  Thomas K. Lloyd Successful Stock Signals for Traders and Portfolio Managers: Integrating Technical Analysis with Fundamentals to Improve Performance , 2013 .

[54]  H. Stanley,et al.  Switching processes in financial markets , 2011, Proceedings of the National Academy of Sciences.

[55]  H Eugene Stanley,et al.  Complex dynamics of our economic life on different scales: insights from search engine query data , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[56]  Prasanna Gai,et al.  Contagion in financial networks , 2010, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[57]  Panos M. Pardalos,et al.  Statistical analysis of financial networks , 2005, Comput. Stat. Data Anal..

[58]  V. Plerou,et al.  A theory of power-law distributions in financial market fluctuations , 2003, Nature.

[59]  X. Zhou,et al.  Stochastic Controls: Hamiltonian Systems and HJB Equations , 1999 .

[60]  W. Sharpe The Sharpe Ratio , 1994 .

[61]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[62]  P. Levy Théorie de l'addition des variables aléatoires , 1955 .

[63]  G. Hunanyan,et al.  Portfolio Selection , 2019, Finanzwirtschaft, Banken und Bankmanagement I Finance, Banks and Bank Management.

[64]  Mamatha V. Jadhav,et al.  Stock Trading Bot Using Deep Reinforcement Learning , 2019 .

[65]  Michael I. Jordan,et al.  Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[66]  L. Blaga,et al.  The relative strength index revisited , 2011 .

[67]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[68]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..