Domain Adaptive Multi-Modality Neural Attention Network for Financial Forecasting

Financial time series analysis plays a central role in optimizing investment decision and hedging market risks. This is a challenging task as the problems are always accompanied by dual-level (i.e, data-level and task-level) heterogeneity. For instance, in stock price forecasting, a successful portfolio with bounded risks usually consists of a large number of stocks from diverse domains (e.g, utility, information technology, healthcare, etc.), and forecasting stocks in each domain can be treated as one task; within a portfolio, each stock is characterized by temporal data collected from multiple modalities (e.g, finance, weather, and news), which corresponds to the data-level heterogeneity. Furthermore, the finance industry follows highly regulated processes, which require prediction models to be interpretable, and the output results to meet compliance. Therefore, a natural research question is how to build a model that can achieve satisfactory performance on such multi-modality multi-task learning problems, while being able to provide comprehensive explanations for the end users. To answer this question, in this paper, we propose a generic time series forecasting framework named Dandelion, which leverages the consistency of multiple modalities and explores the relatedness of multiple tasks using a deep neural network. In addition, to ensure the interpretability of the framework, we integrate a novel trinity attention mechanism, which allows the end users to investigate the variable importance over three dimensions (i.e, tasks, modality and time). Extensive empirical results demonstrate that Dandelion achieves superior performance for financial market prediction across 396 stocks from 4 different domains over the past 15 years. In particular, two interesting case studies show the efficacy of Dandelion in terms of its profitability performance, and the interpretability of output results to end users.

[1]  Ömer Kaan Baykan,et al.  Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange , 2011, Expert Syst. Appl..

[2]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[3]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[4]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[6]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[7]  Jingrui He,et al.  A Graphbased Framework for Multi-Task Multi-View Learning , 2011, ICML.

[8]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[9]  Jingrui He,et al.  HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection , 2017, SDM.

[10]  Kunikazu Kobayashi,et al.  Time series forecasting using a deep belief network with restricted Boltzmann machines , 2014, Neurocomputing.

[11]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[12]  Jingrui He,et al.  MultiC2: an Optimization Framework for Learning from Task and Worker Dual Heterogeneity , 2017, SDM.

[13]  I. Kama,et al.  On the Market Reaction to Revenue and Earnings Surprises , 2009 .

[14]  Hossam Faris,et al.  A Comparison between Regression, Artificial Neural Networks and Support Vector Machines for Predicting Stock Market Index , 2015 .

[15]  Yada Zhu,et al.  HiMuV: Hierarchical Framework for Modeling Multi-modality Multi-resolution Data , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[16]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Zhi-Hua Zhou,et al.  A New Analysis of Co-Training , 2010, ICML.

[18]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[19]  David Zimbra,et al.  Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network , 2013, Expert Syst. Appl..

[20]  Joshua Livnat,et al.  Revenue surprises and stock returns , 2006 .

[21]  Jingrui He,et al.  Towards Explainable Representation of Time-Evolving Graphs via Spatial-Temporal Graph Attention Networks , 2019, CIKM.

[22]  D. Ruppert Statistics and Data Analysis for Financial Engineering , 2010 .

[23]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[24]  Yue Zhang,et al.  Using Structured Events to Predict Stock Price Movement: An Empirical Investigation , 2014, EMNLP.

[25]  Hsinchun Chen,et al.  Textual Analysis of Stock Market Prediction Using Financial News Articles , 2006, AMCIS.

[26]  Philip S. Yu,et al.  Multi-task Network Embedding , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Shouyang Wang,et al.  Forecasting stock market movement direction with support vector machine , 2005, Comput. Oper. Res..

[29]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[30]  Ming Shao,et al.  Multi-View Low-Rank Analysis for Outlier Detection , 2015, SDM.

[31]  James Black,et al.  Multi view image surveillance and tracking , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[32]  Qiang Yang,et al.  Adaptive Localization in a Dynamic WiFi Environment through Multi-view Learning , 2007, AAAI.

[33]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[34]  William N. Goetzmann,et al.  Weather-Induced Mood, Institutional Investors, and Stock Returns , 2014 .

[35]  Hongjun Lu,et al.  The Predicting Power of Textual Information on Financial Markets , 2005, IEEE Intell. Informatics Bull..

[36]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[37]  Jingrui He,et al.  A Randomized Approach for Crowdsourcing in the Presence of Multiple Views , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[38]  Yulei Rao,et al.  A deep learning framework for financial time series using stacked autoencoders and long-short term memory , 2017, PloS one.

[39]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[40]  Jingrui He,et al.  MUVIR: Multi-View Rare Category Detection , 2015, IJCAI.

[41]  Yu Cheng,et al.  Deep Multimodality Model for Multi-task Multi-view Learning , 2019, SDM.

[42]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[43]  Jianmin Wang,et al.  Learning Multiple Tasks with Deep Relationship Networks , 2015, ArXiv.

[44]  Craig A. Knoblock,et al.  Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction , 2003, IJCAI.

[45]  Yongxin Yang,et al.  Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[46]  Alexander Wong,et al.  Opening the Black Box of Financial AI with CLEAR-Trade: A CLass-Enhanced Attentive Response Approach for Explaining and Visualizing Deep Learning-Driven Stock Market Prediction , 2017, ArXiv.

[47]  Yada Zhu,et al.  Learning from Multi-Modality Multi-Resolution Data: an Optimization Approach , 2017, SDM.

[48]  Richard Hull,et al.  Correcting Forecasts with Multifactor Neural Attention , 2016, ICML.

[49]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Jingrui He,et al.  A Local Algorithm for Structure-Preserving Graph Cut , 2017, KDD.

[51]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[52]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[53]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[54]  Tao Lin,et al.  Exploring the interpretability of LSTM neural networks over multi-variable data , 2018 .

[55]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[56]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).