Wealth Flow Model: Online Portfolio Selection Based on Learning Wealth Flow Matrices

This article proposes a deep learning solution to the online portfolio selection problem based on learning a latent structure directly from a price time series. It introduces a novel wealth flow matrix for representing a latent structure that has special regular conditions to encode the knowledge about the relative strengths of assets in portfolios. Therefore, a wealth flow model (WFM) is proposed to learn wealth flow matrices and maximize portfolio wealth simultaneously. Compared with existing approaches, our work has several distinctive benefits: (1) the learning of wealth flow matrices makes our model more generalizable than models that only predict wealth proportion vectors, and (2) the exploitation of wealth flow matrices and the exploration of wealth growth are integrated into our deep reinforcement algorithm for the WFM. These benefits, in combination, lead to a highly-effective approach for generating reasonable investment behavior, including short-term trend following, the following of a few losers, no self-investment, and sparse portfolios. Extensive experiments on five benchmark datasets from real-world stock markets confirm the theoretical advantage of the WFM, which achieves the Pareto improvements in terms of multiple performance indicators and the steady growth of wealth over the state-of-the-art algorithms.

[1]  W. Willinger,et al.  Universal Portfolios , 1991 .

[2]  Wen Long,et al.  Deep learning-based feature engineering for stock price movement prediction , 2019, Knowl. Based Syst..

[3]  Ha Young Kim,et al.  Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning , 2019, Expert Syst. Appl..

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Y. Niv,et al.  Learning latent structure: carving nature at its joints , 2010, Current Opinion in Neurobiology.

[6]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[7]  Michael Patriksson,et al.  Approximating the Pareto optimal set using a reduced set of objective functions , 2010, Eur. J. Oper. Res..

[8]  Erik Cambria,et al.  Sentiment-aware volatility forecasting , 2019, Knowl. Based Syst..

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Yoram Singer,et al.  On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[11]  A. Singer,et al.  UNIVERSAL SEMICONSTANT REBALANCED PORTFOLIOS , 2010 .

[12]  Anjan Nepal Matrix Calculus , 2019, Matrix Calculus, Kronecker Product and Tensor Product.

[13]  László Györfi,et al.  Empirical Portfolio Selection Strategies With Proportional Transaction Costs , 2012, IEEE Transactions on Information Theory.

[14]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[15]  Hung-yi Lee,et al.  Temporal pattern attention for multivariate time series forecasting , 2018, Machine Learning.

[16]  Robert Dochow,et al.  Risk management strategies for finding universal portfolios , 2017, Ann. Oper. Res..

[17]  Fernando Pérez-Cruz,et al.  Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.

[18]  Arindam Banerjee,et al.  Online Portfolio Selection with Group Sparsity , 2014, AAAI.

[19]  Bin Li,et al.  Confidence Weighted Mean Reversion Strategy for Online Portfolio Selection , 2011, TKDD.

[20]  Naren Ramakrishnan,et al.  Deep Reinforcement Learning for Sequence-to-Sequence Models , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Adam Tauman Kalai,et al.  Universal Portfolios With and Without Transaction Costs , 1997, COLT '97.

[22]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[23]  Tim Leung,et al.  Sparse mean-reverting portfolios via penalized likelihood optimization , 2020, Autom..

[24]  Steven C. H. Hoi,et al.  Online portfolio selection: A survey , 2012, CSUR.

[25]  Allan Borodin,et al.  Can We Learn to Beat the Best Stock , 2003, NIPS.

[26]  Weiyi Liu,et al.  Multi-period mean–semivariance portfolio optimization based on uncertain measure , 2018, Soft Comput..

[27]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28]  Erik Cambria,et al.  Intelligent Asset Allocation via Market Sentiment Views , 2018, IEEE Computational Intelligence Magazine.

[29]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[30]  Bin Li,et al.  On-Line Portfolio Selection with Moving Average Reversion , 2012, ICML.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[33]  Erik Cambria,et al.  Financial Sentiment Analysis: An Investigation into Common Mistakes and Silver Bullets , 2020, COLING.

[34]  Haipeng Luo,et al.  Efficient Online Portfolio with Logarithmic Regret , 2018, NeurIPS.

[35]  Xiaotao Zhang,et al.  A new online portfolio selection algorithm based on Kalman Filter and anti-correlation , 2019 .

[36]  Apostolos Serletis,et al.  Mean reversion in the US stock market , 2009 .

[37]  Charles Trzcinka,et al.  A New Estimate of Transaction Costs , 1999 .

[38]  Haim Kaplan,et al.  Competitive Analysis with a Sample and the Secretary Problem , 2019, SODA.

[39]  Bin Li,et al.  Semi-Universal Portfolios with Transaction Costs , 2015, IJCAI.

[40]  Steven C. H. Hoi,et al.  PAMR: Passive aggressive mean reversion strategy for portfolio selection , 2012, Machine Learning.

[41]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[42]  Ruonan Rao,et al.  Learning to Trade with Deep Actor Critic Methods , 2018, 2018 11th International Symposium on Computational Intelligence and Design (ISCID).

[43]  Irene Aldridge,et al.  Big Data in Portfolio Allocation: A New Approach to Successful Portfolio Optimization , 2019, The Journal of Financial Data Science.

[44]  Joshua Zhexue Huang,et al.  Long and Short Term Risk Control for Online Portfolio Selection , 2020, KSEM.

[45]  Yi Wu,et al.  Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[46]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[47]  Marcello Colombino,et al.  Online Optimization as a Feedback Controller: Stability and Tracking , 2018, IEEE Transactions on Control of Network Systems.

[48]  Bin Li,et al.  OLPS: A Toolbox for On-Line Portfolio Selection , 2016, J. Mach. Learn. Res..

[49]  Marcelo Pereyra,et al.  Revisiting Maximum-A-Posteriori Estimation in Log-Concave Models , 2016, SIAM J. Imaging Sci..

[50]  Mengdi Wang,et al.  Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.

[51]  Zhao-Rong Lai,et al.  Short-term Sparse Portfolio Optimization Based on Alternating Direction Method of Multipliers , 2018, J. Mach. Learn. Res..

[52]  Zhengyao Jiang,et al.  Cryptocurrency portfolio management with deep reinforcement learning , 2016, 2017 Intelligent Systems Conference (IntelliSys).

[53]  Xia Cai,et al.  Gaussian Weighting Reversion Strategy for Accurate Online Portfolio Selection , 2019, IEEE Transactions on Signal Processing.

[54]  Yong Zhang,et al.  Aggregating expert advice strategy for online portfolio selection with side information , 2020, Soft Comput..

[55]  Gwenn Englebienne,et al.  Learning latent structure for activity recognition , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Carlo Vercellis,et al.  Public Mood–Driven Asset Allocation: the Importance of Financial Sentiment in Portfolio Management , 2018, Cognitive Computation.

[57]  Loris Nanni,et al.  Handcrafted vs. non-handcrafted features for computer vision classification , 2017, Pattern Recognit..

[58]  Robert E. Schapire,et al.  Algorithms for portfolio management based on the Newton method , 2006, ICML.

[59]  Joshua Zhexue Huang,et al.  An Asymptotic Statistical Learning Algorithm for Prediction of Key Trading Events , 2020, IEEE Intelligent Systems.

[60]  Weihua Ruan,et al.  Time-varying long-term memory in Bitcoin market , 2017, Finance Research Letters.

[61]  Cynthia Breazeal,et al.  Machine behaviour , 2019, Nature.

[62]  Steven C. H. Hoi,et al.  Transaction cost optimization for online portfolio selection , 2018 .

[63]  Yunde Jia,et al.  Online maximum a posteriori tracking of multiple objects using sequential trajectory prior , 2020, Image Vis. Comput..

[64]  Koby Crammer,et al.  A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.

[65]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[66]  Ronald Ortner,et al.  Variational Regret Bounds for Reinforcement Learning , 2019, UAI.