Predicting stock trend using an integrated term frequency-inverse document frequency-based feature weight matrix with neural networks

Abstract The financial market consists of various money-making strategies wherein trading through a stock market is an important example. The complex non-linear behaviors of volatile stock markets attract researchers to study inherent patterns. As the primary motivation for investment in such markets is to gain higher profits, potential stocks are given considerable attention using various weighting strategies that can enhance future returns. Term frequency–inverse document frequency (TF–IDF) is a statistical approach with remarkable applications in the financial domain for information retrieval from textual data; it identifies the importance of a term in the given document of a corpus. However, the application of TF–IDF for the numerical data representation is explored to a limited extent. In this article, we propose to extend the applicability of TF–IDF for the numerical time-series stock market data; we process the data and prepare them to be suitable for TF–IDF. We utilize this statistical approach to derive feature weight matrix from the historical stock market data and further, integrate it with the widely explored neural network architectures namely, backpropagation neural network (BPNN), long short-term memory (LSTM), and gated recurrent unit (GRU) for predicting stock market trend. Simulation results show that the proposed integrated approach using TF–IDF-based feature weight matrix and neural networks outperforms the considered recent approaches. The results are statistically supported with p -value less than . 01 using a Wilcoxon signed-rank test; our proposed approach is supported with illustrative examples to develop better understanding of the work. Also, remarks on the conclusions and potential future scope are discussed.

[1]  Ankit Thakkar,et al.  Fusion in stock market prediction: A decade survey on the necessity, recent developments, and potential future directions , 2020, Information Fusion.

[2]  Savas Yildirim,et al.  Classification of "Hot News" for Financial Forecast Using NLP Techniques , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[3]  I-Hsien Ting,et al.  Opinion Mining and the Visualization of Stock Selection in Quantitative Trading , 2019, 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI).

[4]  Ankit Thakkar,et al.  Survey on handwriting-based personality trait identification , 2019, Expert Syst. Appl..

[5]  Stelios D. Bekiros,et al.  Digital currency forecasting with chaotic meta-heuristic bio-inspired signal processing techniques , 2019, Chaos, Solitons & Fractals.

[6]  Ankit Thakkar,et al.  A Voting-Based Sentiment Classification Model , 2020 .

[7]  Ankit Thakkar,et al.  Aggregate features approach for texture analysis , 2012, 2012 Nirma University International Conference on Engineering (NUiCONE).

[8]  Oscar Castillo,et al.  A New Approach to Multiple Time Series Prediction Using MIMO Fuzzy Aggregation Models with Modular Neural Networks , 2019, Int. J. Fuzzy Syst..

[9]  Adriano Lorena Inácio de Oliveira,et al.  Expert Systems With Applications , 2022 .

[10]  Oscar Castillo,et al.  A New Approach for Time Series Prediction Using Ensembles of IT2FNN Models with Optimization of Fuzzy Integrators , 2018, International Journal of Fuzzy Systems.

[11]  Ayman E. Khedr,et al.  Predicting Stock Market Behavior using Data Mining Technique and News Sentiment Analysis , 2017 .

[12]  Helmut Jungermann,et al.  Investment risk – The perspective of individual investors , 2012 .

[13]  Vadlamani Ravi,et al.  A survey of the applications of text mining in financial domain , 2016, Knowl. Based Syst..

[14]  Sanjay Sharma,et al.  A Novel Fuzzy Document Based Information Retrieval Model for Forecasting , 2017 .

[15]  Ankit Thakkar,et al.  Attack classification using feature selection techniques: a comparative study , 2020, J. Ambient Intell. Humaniz. Comput..

[16]  Ankit Thakkar,et al.  A Comprehensive Survey on Portfolio Optimization, Stock Price and Trend Prediction Using Particle Swarm Optimization , 2020, Archives of Computational Methods in Engineering.

[17]  P. Philips,et al.  Why Do We Invest Ethically? , 2005 .

[18]  Kenneth E. Barner,et al.  Social Relationship Recognition Based on A Hybrid Deep Neural Network , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[19]  Hui Lin,et al.  Impacts of Feature Normalization on Optical and SAR Data Fusion for Land Use/Land Cover Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[20]  Ankit Thakkar,et al.  CREST: Cross-Reference to Exchange-based Stock Trend Prediction using Long Short-Term Memory , 2020 .

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Charles L. A. Clarke,et al.  Information Retrieval - Implementing and Evaluating Search Engines , 2010 .

[23]  Francesco Rundo,et al.  Machine Learning for Quantitative Finance Applications: A Survey , 2019, Applied Sciences.

[24]  Aytaç Altan,et al.  THE EFFECT OF KERNEL VALUES IN SUPPORT VECTOR MACHINE TO FORECASTING PERFORMANCE OF FINANCIAL TIME SERIES , 2019 .

[25]  A. Bhattacharyya,et al.  Protease Inhibitors from Marine Actinobacteria as a Potential Source for Antimalarial Compound , 2014, PloS one.

[26]  Purva Raut,et al.  Application of LSTM, GRU and ICA for Stock Price Prediction , 2018, Information and Communication Technology for Intelligent Systems.

[27]  F. Fabozzi,et al.  Equal-weighted strategy: Why it outperforms value-weighted strategies? Theory and evidence , 2017 .

[28]  Ankit Thakkar,et al.  Role of swarm and evolutionary algorithms for intrusion detection system: A survey , 2020, Swarm Evol. Comput..

[29]  Krys J. Kochut,et al.  A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques , 2017, ArXiv.

[30]  Djoerd Hiemstra,et al.  A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[31]  Aytaç Altan,et al.  Recognition Model for Solar Radiation Time Series based on Random Forest with Feature Selection Approach , 2019, 2019 11th International Conference on Electrical and Electronics Engineering (ELECO).

[32]  Ankit Thakkar,et al.  A Comparative Study of Machine Learning Techniques for Emotion Recognition , 2019 .

[33]  Rifat Hacioglu,et al.  Prediction of Bitcoin prices with machine learning methods using time series data , 2018, 2018 26th Signal Processing and Communications Applications Conference (SIU).

[34]  Jay Shankar Prasad,et al.  Efficacy of News Sentiment for Stock Market Prediction , 2019, 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon).

[35]  Li Tang,et al.  Predicting the direction of stock markets using optimized neural networks with Google Trends , 2018, Neurocomputing.

[36]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[37]  Ömer Kaan Baykan,et al.  Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange , 2011, Expert Syst. Appl..

[38]  Mohammad Rabiul Islam,et al.  Technical Approach in Text Mining for Stock Market Prediction: A Systematic Review , 2018 .

[39]  B. Hambly,et al.  THE 3/2 MODEL AS A STOCHASTIC VOLATILITY APPROXIMATION FOR A LARGE-BASKET PRICE-WEIGHTED INDEX , 2015 .

[40]  Ankit Thakkar,et al.  A new hybrid method for face recognition , 2013, 2013 Nirma University International Conference on Engineering (NUiCONE).

[41]  Mu-Yen Chen,et al.  Modeling public mood and emotion: Blog and news sentiment and socio-economic phenomena , 2017, Future Gener. Comput. Syst..

[42]  M. S. B. PhridviRaja,et al.  Data Mining – Past, Present and Future – A Typical Survey on Data Streams☆ , 2014 .

[43]  Nenghai Yu,et al.  Semantics-Preserving Bag-of-Words Models and Applications , 2010, IEEE Transactions on Image Processing.

[44]  Ankit Thakkar,et al.  A Comprehensive Survey on Travel Recommender Systems , 2019, Archives of Computational Methods in Engineering.

[45]  B. Baranidharan,et al.  A Survey on Stock Market Prediction using Artificial Intelligence Techniques , 2018, 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT).

[46]  Cheng-Few Lee,et al.  Technical, Fundamental, and Combined Information for Separating Winners from Losers , 2015, Handbook of Financial Econometrics, Mathematics, Statistics, and Machine Learning.

[47]  Shouyang Wang,et al.  Forecasting stock market movement direction with support vector machine , 2005, Comput. Oper. Res..

[48]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[49]  Ankit Thakkar,et al.  Sentiment analysis: an empirical comparison between various training algorithms for artificial neural network , 2020, International Journal of Innovative Computing and Applications.