A comprehensive cluster and classification mining procedure for daily stock market return forecasting

Abstract Data mining and big data analytic techniques are playing an important role in many application fields, including the financial markets. However, only few studies have focused on predicting daily stock market returns, and among these studies, the data mining procedures utilized are either incomplete or inefficient. This paper presents a comprehensive data mining process to forecast the daily direction of the S&P 500 Index ETF (SPY) return based on 60 financial and economical features. The fuzzy c-means method (FCM) is initially used to cluster the preprocessed data. A principal component analysis (PCA) is applied next to the entire data set and each of seven clusters. The dimension of the entire cleaned data set is then reduced according to the combining results from the entire data set and each cluster. Corresponding to different levels of the dimensionality reduction, twelve new data sets are generated from the entire cleaned data. Artificial neural networks (ANNs) and logistic regression models are then used with the twelve transformed data sets for classification in order to forecast the daily direction of future market returns and indicate the efficiency of dimensionality reduction with PCA. A group of hypothesis tests are performed over the classification and simulation results to show that the ANNs give significantly higher classification accuracy than logistic regression, and that the trading strategies guided by the comprehensive cluster and classification mining procedure based on PCA and ANNs gain higher risk-adjusted profits than the comparison benchmarks, as well as those strategies guided by the forecasts based on PCA and logistic regression models.

[1]  Shih Cheng Data Mining: A Survey , 2014 .

[2]  James V. Hansen,et al.  Data mining of time series using stacked generalizers , 2002, Neurocomputing.

[3]  Michele Marchesi,et al.  A hybrid genetic-neural architecture for stock indexes forecasting , 2005, Inf. Sci..

[4]  Youngmin Kim,et al.  A relative value trading system based on a correlation and rough set analysis for the foreign exchange futures market , 2017, Eng. Appl. Artif. Intell..

[5]  Guido Deboeck,et al.  Trading on the Edge: Neural, Genetic, and Fuzzy Systems for Chaotic Financial Markets , 1994 .

[6]  Xiao Zhong,et al.  A Study of Several Statistical Methods for Classification with Application to Microbial Source Tracking , 2004 .

[7]  Bruce J. Vanstone,et al.  An empirical methodology for developing stockmarket trading systems using artificial neural networks , 2009, Expert Syst. Appl..

[8]  Kimon P. Valavanis,et al.  Surveying stock market forecasting techniques - Part II: Soft computing methods , 2009, Expert Syst. Appl..

[9]  Ingoo Han,et al.  Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index , 2000 .

[10]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[11]  Youngmin Kim,et al.  Developing a rule change trading system for the futures market using rough set analysis , 2016, Expert Syst. Appl..

[12]  L Cao,et al.  FINANCIAL FORECASTING USING VECTOR MACHINES , 2001 .

[13]  David Enke,et al.  The use of data mining and neural networks for forecasting stock market returns , 2005, Expert Syst. Appl..

[14]  Marc J. Schniederjans,et al.  A comparison between Fama and French's model and artificial neural networks in predicting the Chinese stock market , 2005, Comput. Oper. Res..

[15]  Monica Lam,et al.  Neural network techniques for financial performance prediction: integrating fundamental and technical analysis , 2004, Decis. Support Syst..

[16]  Mevlut Ture,et al.  Comparison of four different time series methods to forecast hepatitis A virus infection , 2006, Expert Syst. Appl..

[17]  Yi-Fan Wang,et al.  Predicting stock price using fuzzy grey prediction system , 2002, Expert Syst. Appl..

[18]  A. Lo,et al.  Stock Market Prices Do Not Follow Random Walks: Evidence from a Simple Specification Test , 1987 .

[19]  Cihan H. Dagli,et al.  Using Neural Networks and Technical Analysis Indicators for Predicting Stock Trends , 2001 .

[20]  Paulo J. G. Lisboa,et al.  Segmentation of the on-line shopping market using neural networks , 1999 .

[21]  Michael G. Madden,et al.  A neural network approach to predicting stock exchange movements using external factors , 2005, Knowl. Based Syst..

[22]  Han Tong Loh,et al.  Applying rough sets to market timing decisions , 2004, Decis. Support Syst..

[23]  Ömer Kaan Baykan,et al.  Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange , 2011, Expert Syst. Appl..

[24]  William Cyrus Navidi,et al.  Statistics for Engineers and Scientists , 2004 .

[25]  D. Cox The Regression Analysis of Binary Sequences , 2017 .

[26]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[27]  Nicholas Sarantis,et al.  Nonlinearities, cyclical behaviour and predictability in stock markets: international evidence , 2001 .

[28]  C. Tan,et al.  NEURAL NETWORKS FOR TECHNICAL ANALYSIS: A STUDY ON KLCI , 1999 .

[29]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[30]  I. Jolliffe Principal Component Analysis , 2002 .

[31]  Se-Hak Chun,et al.  Data mining for financial prediction and trading: application to single and multiple markets , 2004, Expert Syst. Appl..

[32]  Hari Mohan Pandey,et al.  Data clustering approaches survey and analysis , 2015, 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE).

[33]  Dilip Kumar Pratihar,et al.  A Comparative Study of Fuzzy C-Means Algorithm and Entropy-Based Fuzzy Clustering Algorithms , 2011, Comput. Informatics.

[34]  Paulo J. G. Lisboa,et al.  Financial time series prediction using polynomial pipelined neural networks , 2008, Expert Syst. Appl..

[35]  David Enke,et al.  The adaptive selection of financial and economic variables for use with artificial neural networks , 2004, Neurocomputing.

[36]  Seyed Taghi Akhavan Niaki,et al.  Forecasting S&P 500 index using artificial neural networks and design of experiments , 2013 .

[37]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[38]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[39]  Song-can Chen,et al.  Kernel-based fuzzy and possibilistic c-means clustering , 2003 .

[40]  Kyong Joo Oh,et al.  An intelligent hybrid trading system for discovering trading rules for the futures market using rough sets and genetic algorithms , 2017, Appl. Soft Comput..

[41]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[42]  Vamsi Krishna Bogullu,et al.  Using Neural Networks and Technical Indicators for Generating Stock Trading Signals , 2002 .

[43]  M. C. Jensen Some Anomalous Evidence Regarding Market Efficiency , 1978 .

[44]  Miin-Shen Yang,et al.  Alternative c-means clustering algorithms , 2002, Pattern Recognit..

[45]  Xiao Zhong,et al.  Forecasting daily stock market return using dimensionality reduction , 2017, Expert Syst. Appl..

[46]  Cihan H. Dagli,et al.  A hybrid option pricing model using a neural network for estimating volatility , 2007, Int. J. Gen. Syst..

[47]  An-Sing Chen,et al.  Application of Neural Networks to an Emerging Financial Market: Forecasting and Trading the Taiwan Stock Index , 2001, Comput. Oper. Res..

[48]  David Enke,et al.  Interest rate prediction: a neuro-hybrid approach with data preprocessing , 2014, Int. J. Gen. Syst..

[49]  Amir F. Atiya,et al.  Introduction to financial forecasting , 1996, Applied Intelligence.

[50]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[52]  Francis Eng Hock Tay,et al.  Financial Forecasting Using Support Vector Machines , 2001, Neural Computing & Applications.

[53]  Zhe George Zhang,et al.  Forecasting stock indices with back propagation neural network , 2011, Expert Syst. Appl..

[54]  A. Neil Burgess,et al.  Neural networks in financial engineering: a study in methodology , 1997, IEEE Trans. Neural Networks.

[55]  Hong Wang,et al.  Predicting stock index increments by neural networks: The role of trading volume under different horizons , 2008, Expert Syst. Appl..

[56]  P. Franses,et al.  Additive outliers, GARCH and forecasting volatility , 1999 .

[57]  Tugrul U. Daim,et al.  Using artificial neural network models in stock market index prediction , 2011, Expert Syst. Appl..