Extraction of the Essential Constituents of the S&P 500 Index

The S&P 500 index is a leading indicator of the stock market and U.S. equities which is highly influenced by its essential constituents. Traditionally, such constituents are identified by the market capitalization weighting scheme. However, the literature rejects the efficiency of the weighting method. In contrast, we introduce data mining approaches of the entropy and rough sets as two separate methods for extraction of the essential S&P 500 constituents. The legitimacy of the findings in comparison with the S&P 500 weighting scheme have been investigated using the discrete time Markov Chain Models (MCM) and Hidden Markov Chain Models (HMCM) which lend themselves easily to the nature of the time-series data. The investigation is done against data for the full sample and pre/post crisis subsamples collected for the period of 16 years. We find the entropy method provides the highest forecasting accuracy measure for the full sample and post-crisis subsample.

[1]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[2]  Jeremy D. Schwartz,et al.  Long-Term Returns on the Original S&P 500 Companies , 2006 .

[3]  Anthony D. Joseph,et al.  Daily Stock Returns Characteristics and Forecastability , 2017 .

[4]  Ray R. Hashemi,et al.  The Use of Rough Sets as a Data Mining Tool for Experimental Bio-data , 2008, Computational Intelligence in Biomedicine and Bioinformatics.

[5]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[6]  Zdzislaw Pawlak,et al.  Rough classification , 1984, Int. J. Hum. Comput. Stud..

[7]  Yudong Zhang,et al.  Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network , 2009, Expert Syst. Appl..

[8]  Dong Yu,et al.  Deep Neural Networks , 2015 .

[9]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[10]  Nicolas Huck,et al.  Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 , 2017, Eur. J. Oper. Res..

[11]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[12]  Manuel R. Vargas,et al.  Deep learning for stock market prediction from financial news articles , 2017, 2017 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA).

[13]  P. Perron,et al.  Computation and Analysis of Multiple Structural-Change Models , 1998 .

[14]  Paul D. Kaplan,et al.  Why Fundamental Indexation Might—or Might Not—Work , 2008 .

[15]  Dennis E. Logue,et al.  Foundations of Finance. , 1977 .

[16]  J. Poterba,et al.  What moves stock prices? , 1988 .

[17]  Nicholas G. Polson,et al.  Deep Learning in Finance , 2016, ArXiv.

[18]  Michael Guerzhoy,et al.  Deep Neural Networks , 2013 .

[19]  Xing M. Wang Probability Bracket Notation: Markov State Chain Projector, Hidden Markov Models and Dynamic Bayesian Networks , 2012, ArXiv.

[20]  Jason Hsu,et al.  Fundamental Indexation , 2005 .

[21]  Waleed H. Abdulla,et al.  The concepts of hidden Markov model in speech recognition , 1999 .