Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process

Abstract The stock market has performed one of the most important functions in a laissez-faire economic system by gathering people, companies, and flows of money for several centuries. There have been numerous studies on the stock market among researchers to predict stock prices, and a growing number of studies employed machine learning or deep learning techniques on the stock market predictions with the advent of big data and the rapid development of artificial intelligence techniques. However, making accurate predictions of stock price direction remains difficult because stock prices are inherently complex, nonlinear, nonstationary, and sometimes too irrational to be predictable. Despite the wealth of information, previous prediction systems often overlooked key indicators and the importance of feature engineering. This study proposes a hybrid GA-XGBoost prediction system with an enhanced feature engineering process consisting of feature set expansion, data preparation, and optimal feature set selection using the hybrid GA-XGBoost algorithm. This study experimentally verifies the importance of feature engineering process in stock price direction prediction by comparing obtained feature sets to original dataset as well as improving prediction performance to outperform benchmark models. Specifically, the most significant accuracy increment comes from feature expansion that adds 67 technical indicators to the original historical stock price data. This study also produces a parsimonious optimal feature set using the GA-XGBoost algorithm that can achieve the desired performance with substantially fewer features. Consequently, this study empirically proves that a successful prediction performance largely depends on a deliberate combination of feature engineering processes with a baseline learning model to make a good balance and harmony between the curse of dimensionality and the blessing of dimensionality.

[1]  Jiangtao Ren,et al.  An integrated framework of deep learning and knowledge graph for prediction of stock price trend: An application in Chinese stock exchange market , 2020, Appl. Soft Comput..

[2]  Guangyu Ding,et al.  Study on the prediction of stock price based on the associated network model of LSTM , 2019, International Journal of Machine Learning and Cybernetics.

[3]  Yu Song,et al.  Predicting the Direction of Stock Market Index Movement Using an Optimized Artificial Neural Network Model , 2016, PloS one.

[4]  Yue Zhang,et al.  DeepClue: Visual Interpretation of Text-Based Deep Stock Prediction , 2019, IEEE Transactions on Knowledge and Data Engineering.

[5]  Bruce J Vanstone,et al.  Designing Stock Market Trading Systems: With and without soft computing , 2010 .

[6]  Andreas Rauber,et al.  Machine Learning Interpretability Techniques for Outage Prediction: A Comparative Study , 2020 .

[7]  Ivan Tyukin,et al.  Blessing of dimensionality: mathematical foundations of the statistical physics of data , 2018, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[8]  Isaac Kofi Nti,et al.  A systematic review of fundamental and technical analysis of stock market predictions , 2019, Artificial Intelligence Review.

[9]  Sahil Shah,et al.  Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques , 2015, Expert Syst. Appl..

[10]  Jongwoo Lee,et al.  A study on novel filtering and relationship between input-features and target-vectors in a deep learning model for stock price prediction , 2018, Applied Intelligence.

[11]  François Chollet,et al.  Deep Learning with Python , 2017 .

[12]  Kyung-shik Shin,et al.  Genetic Algorithm-Optimized Long Short-Term Memory Network for Stock Market Prediction , 2018, Sustainability.

[13]  Qiang Gao,et al.  Predicting the Trend of Stock Market Index Using the Hybrid Neural Network Based on Multiple Time Scale Feature Learning , 2020, Applied Sciences.

[14]  Snehanshu Saha,et al.  Predicting the direction of stock market prices using tree-based classifiers , 2019, The North American Journal of Economics and Finance.

[15]  Haruna Isah,et al.  Stock Market Analysis: A Review and Taxonomy of Prediction Techniques , 2019, International Journal of Financial Studies.

[16]  S. Sitharama Iyengar,et al.  Data-Driven Techniques in Disaster Information Management , 2017, ACM Comput. Surv..

[17]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[18]  Fred A. Hamprecht,et al.  Cost efficient gradient boosting , 2017, NIPS.

[19]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[20]  Evangelos Spiliotis,et al.  Statistical and Machine Learning forecasting methods: Concerns and ways forward , 2018, PloS one.

[21]  Biju R. Mohan,et al.  Stock Price Movements Classification Using Machine and Deep Learning Techniques-The Case Study of Indian Stock Market , 2019, EANN.

[22]  Herbert Kimura,et al.  Literature review: Machine learning techniques applied to financial market prediction , 2019, Expert Syst. Appl..

[23]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[24]  Michel Ballings,et al.  Evaluating multiple classifiers for stock price direction prediction , 2015, Expert Syst. Appl..

[25]  Xuesong Yan,et al.  Stock price prediction based on deep neural networks , 2019, Neural Computing and Applications.

[26]  L. Vasseur,et al.  Mechanism and consequences for avoidance of superparasitism in the solitary parasitoid Cotesia vestalis , 2020, Scientific Reports.

[27]  Amir Mosavi,et al.  Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous and Binary Data; a Comparative Analysis , 2020, IEEE Access.

[28]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[29]  Wei Chen,et al.  Bayesian Optimization for Materials Design with Mixed Quantitative and Qualitative Variables , 2019, Scientific Reports.

[30]  Claudio De Stefano,et al.  A GA-based feature selection approach with an application to handwritten character recognition , 2014, Pattern Recognit. Lett..

[31]  Huan Liu,et al.  Feature Engineering for Machine Learning and Data Analytics , 2018 .

[32]  Zhiguang Qin,et al.  Evaluation of Tree-Based Ensemble Machine Learning Models in Predicting Stock Price Direction of Movement , 2020, Inf..

[33]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[34]  Paul C. Kainen,et al.  Utilizing Geometric Anomalies of High Dimension: When Complexity Makes Computation Easier , 1997 .

[35]  Rui Ferreira Neves,et al.  Combining Principal Component Analysis, Discrete Wavelet Transform and XGBoost to trade in the financial markets , 2019, Expert Syst. Appl..

[36]  Harry Boxer,et al.  Profitable Day and Swing Trading: Using Price/Volume Surges and Pattern Recognition to Catch Big Moves in the Stock Market , 2014 .

[37]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[38]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[39]  Troy J. Strader,et al.  Machine Learning Stock Market Prediction Studies: Review and Research Directions , 2020, Journal of International Technology and Information Management.

[40]  Verónica Bolón-Canedo,et al.  Fast‐mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High‐Dimensional Big Data , 2017, Int. J. Intell. Syst..

[41]  Alexandros Iosifidis,et al.  Feature Engineering for Mid-Price Prediction With Deep Learning , 2019, IEEE Access.

[42]  Guilherme Sousa Bastos,et al.  Stock Market Forecasting Using Deep Learning and Technical Analysis: A Systematic Review , 2020, IEEE Access.

[43]  Kyung-shik Shin,et al.  Genetic algorithm-optimized multi-channel convolutional neural network for stock market prediction , 2019, Neural Computing and Applications.

[44]  Jiguo Yu,et al.  An XGBoost-based physical fitness evaluation model using advanced feature selection and Bayesian hyper-parameter optimization for wearable running monitoring , 2019, Comput. Networks.

[45]  J. Murphy Technical Analysis of the Futures Markets: A Comprehensive Guide to Trading Methods and Applications , 1986 .

[46]  Yongtao Hao,et al.  A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction , 2017, Expert Syst. Appl..

[47]  Jianyu Miao,et al.  A Survey on Feature Selection , 2016 .

[48]  Robert Mullins,et al.  On the Reduction of Computational Complexity of Deep Convolutional Neural Networks † , 2018, Entropy.

[49]  Qiang Huang,et al.  Unsupervised Nonlinear Feature Selection from High-Dimensional Signed Networks , 2020, AAAI.

[50]  M. Omair Shafiq,et al.  Short-term stock market price trend prediction using a comprehensive deep learning system , 2020, J. Big Data.

[51]  Shahab S,et al.  Deep Learning for Stock Market Prediction , 2020, Entropy.

[52]  Guihua Tao,et al.  Deep Learning for Price Movement Prediction Using Convolutional Neural Network and Long Short-Term Memory , 2020 .

[53]  Mani B. Srivastava,et al.  How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods , 2020, NeurIPS.

[54]  Neelu Khare,et al.  An efficient XGBoost–DNN-based classification model for network intrusion detection system , 2020, Neural Computing and Applications.

[55]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[56]  Wei Xu,et al.  Combining the wisdom of crowds and technical analysis for financial market prediction using deep random subspace ensembles , 2018, Neurocomputing.

[57]  Zehong Yang,et al.  Intelligent stock trading system based on improved technical analysis and Echo State Network , 2011, Expert Syst. Appl..