News Sensitive Stock Trend Prediction

Stock market prediction with data mining techniques is one of the most important issues to be investigated. In this paper, we present a system that predicts the changes of stock trend by analyzing the influence of non-quantifiable information (news articles). In particular, we investigate the immediate impact of news articles on the time series based on the Efficient Markets Hypothesis. Several data mining and text mining techniques are used in a novel way. A new statistical based piecewise segmentation algorithm is proposed to identify trends on the time series. The segmented trends are clustered into two categories, Rise and Drop, according to the slope of trends and the coefficient of determination. We propose an algorithm, which is called guided clustering, to filter news articles with the help of the clusters that we have obtained from trends. We also propose a new differentiated weighting scheme that assigns higher weights to the features if they occur in the Rise (Drop) news-article cluster but do not occur in its opposite Drop (Rise).

[1]  Padhraic J. Smyth,et al.  Hidden Markov models for fault detection in dynamic systems , 1993 .

[3]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[4]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[5]  W. Mendenhall,et al.  A second course in business statistics: Regression analysis , 1981 .

[6]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[7]  Patricia A. Adler,et al.  The Social dynamics of financial markets , 1984 .

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[10]  Soon Myoung Chung,et al.  Efficient mining of association rules in text databases , 1999, CIKM '99.

[11]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[12]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[13]  Takenobu Tokunaga,et al.  Text Categorization based on Weighted Inverse Document Frequency , 1994 .

[14]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[15]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[16]  Phil Mattox,et al.  The stock market , 2002 .

[17]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[18]  Charles Amos Dice,et al.  The stock market , 1952 .

[19]  Theodosios Pavlidis,et al.  Segmentation of Plane Curves , 1974, IEEE Transactions on Computers.

[20]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[21]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.