Text Mining of Stocktwits Data for Predicting Stock Prices

Stock price prediction can be made more efficient by considering the price fluctuations and understanding people’s sentiments. A limited number of models understand financial jargon or have labelled datasets concerning stock price change. To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google). These datasets were labelled with three labelling techniques based on stock price changes. Our proposed model FinALBERT is fine-tuned with these labels to achieve optimal results. We experimented with the labelled dataset by training it on traditional machine learning, BERT, and FinBERT models, which helped us understand how these labels behaved with different model architectures. Our labelling method’s competitive advantage is that it can help analyse the historical data effectively, and the mathematical function can be easily customised to predict stock movement. The code and data are available from https://mkhushi.github.io/.

[1]  Ranjan Kumar Behera,et al.  Real-Time Sentiment Analysis of Twitter Streaming data for Stock Prediction , 2018 .

[2]  Matloob Khushi,et al.  Portfolio Optimization with 2D Relative-Attentional Gated Transformer , 2020, 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE).

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  Avi Arampatzis,et al.  Stock Price Forecasting via Sentiment Analysis on Twitter , 2016, PCI.

[5]  Lubomir T. Chitkushev,et al.  Evaluation of Sentiment Analysis in Finance: From Lexicons to Transformers , 2020, IEEE Access.

[6]  Matloob Khushi,et al.  Reinforcement Learning in Financial Markets , 2019, Data.

[7]  Matloob Khushi,et al.  Feature Learning for Stock Price Prediction Shows a Significant Role of Analyst Rating , 2021, Applied System Innovation.

[8]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[9]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[10]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[11]  Dogu Araci,et al.  FinBERT: Financial Sentiment Analysis with Pre-trained Language Models , 2019, ArXiv.

[12]  Matloob Khushi,et al.  BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition , 2020, 2021 International Joint Conference on Neural Networks (IJCNN).

[13]  Matloob Khushi,et al.  Event-Driven LSTM For Forex Price Prediction , 2020, 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE).

[14]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[15]  Zhiwen Zeng,et al.  Wavelet Denoising and Attention-based RNN- ARIMA Model to Predict Forex Price , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[16]  Ganapati Panda,et al.  Sentiment analysis of Twitter data for predicting stock market movements , 2016, 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES).

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Matloob Khushi,et al.  GA-MSSR: Genetic Algorithm Maximizing Sharpe and Sterling Ratio Method for RoboTrading , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[19]  Imran Razzak,et al.  A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter , 2020, Multimedia Tools and Applications.

[20]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[21]  Amir Mosavi,et al.  Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous and Binary Data; a Comparative Analysis , 2020, IEEE Access.

[22]  Surya Prakash,et al.  Twitter Sentiment Analysis Using Binary Classification Technique , 2016, ICTCC.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Matloob Khushi,et al.  Wavelet Denoised-ResNet CNN and LightGBM Method to Predict Forex Rate of Change , 2020, 2020 International Conference on Data Mining Workshops (ICDMW).

[25]  Katarzyna Musial,et al.  Transformer based Deep Intelligent Contextual Embedding for Twitter sentiment analysis , 2020, Future Gener. Comput. Syst..

[26]  Matloob Khushi,et al.  A Survey of Forex and Stock Price Prediction Using Deep Learning , 2021, Applied System Innovation.

[27]  Jingyang Wang,et al.  A CNN-BiLSTM-AM method for stock price prediction , 2020, Neural Computing and Applications.