An Effective Clustering Approach to Stock Market Prediction

In this paper, we propose an effective clustering method, HRK (Hierarchical agglomerative and Recursive K-means clustering), to predict the short-term stock price movements after the release of financial reports. The proposed method consists of three phases. First, we convert each financial report into a feature vector and use the hierarchical agglomerative clustering method to divide the converted feature vectors into clusters. Second, for each cluster, we recursively apply the K-means clustering method to partition each cluster into sub-clusters so that most feature vectors in each subcluster belong to the same class. Then, for each sub-cluster, we choose its centroid as the representative feature vector. Finally, we employ the representative feature vectors to predict the stock price movements. The experimental results show the proposed method outperforms SVM in terms of accuracy and average profits.

[1]  Marc-André Mittermayer,et al.  Forecasting Intraday stock price trends with text mining techniques , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[2]  J. van Leeuwen,et al.  Intelligent Data Engineering and Automated Learning , 2003, Lecture Notes in Computer Science.

[3]  Jianying Hu,et al.  Statistical methods for automated generation of service engagement staffing plans , 2007, IBM J. Res. Dev..

[4]  Hannu Vanharanta,et al.  The language of quarterly reports as an indicator of change in the company's financial status , 2005, Inf. Manag..

[5]  Yi Pan,et al.  Novel hybrid hierarchical-K-means clustering method (H-K-means) for microarray analysis , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[6]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[7]  Yanfang Ye,et al.  A parameter-free hybrid clustering algorithm used for malware categorization , 2009, 2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication.

[8]  Hannu Vanharanta,et al.  Comparing numerical data and text information from annual reports using self-organizing maps , 2001, Int. J. Account. Inf. Syst..

[9]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[10]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[11]  Geoffrey Williams Collocational networks: Interlocking patterns of lexis in a Corpusof plant biology research articles , 1998 .

[12]  David D. Jensen,et al.  Mining of Concurrent Text and Time Series , 2008 .

[13]  Brian J. Bushee,et al.  Abnormal Returns to a Fundamental Analysis Strategy , 1997 .

[14]  Zehong Yang,et al.  Short-term stock price prediction based on echo state networks , 2009, Expert Syst. Appl..

[15]  Bernard K.-S. Cheung,et al.  Artificial Intelligence in Portfolio Management , 2002, IDEAL.

[16]  Thomas A. Carnes Unexpected Changes in Quarterly Financial-Statement Line Items and Their Relationship to Stock Prices , 2006 .

[17]  Noah A. Smith,et al.  Predicting Risk from Financial Reports with Regression , 2009, NAACL.

[18]  Chee Keong Kwoh,et al.  On the Two-level Hybrid Clustering Algorithm , 2004 .

[19]  R. Jennings,et al.  On Technical Analysis , 1989 .

[20]  Hannu Vanharanta,et al.  Combining data and text mining techniques for analysing financial reports: Research Articles , 2004 .

[21]  E. Fama The Behavior of Stock-Market Prices , 1965 .

[22]  Guochang Zhang,et al.  How Do Accounting Variables Explain Stock Price Movements? Theory and Evidence , 2006 .

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Kuo-Tay Chen,et al.  Predicting Future Earnings Change Using Numeric and Textual Information in Financial Reports , 2009, PAISI.

[25]  Padmini Srinivasan,et al.  Exploring the Forecasting Potential of Company Annual Reports , 2006, ASIST.

[26]  R. Palmer,et al.  Time series properties of an artificial stock market , 1999 .