Text Categorization System for Stock Prediction

Due to the complex market environment, it is very difficult to get the accurate predict of the stake only by the data analysis method. This paper uses the text categorization method to predict the trend of the stock. We divide the text categorization method into the following three steps: Text representation, Feature selection and Text Categorization. By comparing several categorization methods including feature selections and feature spaces, etc., the results show that the SVM method with Information Gain and 1000 feature spaces can get the better performance for the predict of the stock with the news.

[1]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[2]  Alexandre d'Aspremont,et al.  Predicting abnormal returns from news using text classification , 2008, 0809.2792.

[3]  Yan Jia,et al.  A Text Categorization Method Based on SVM and Improved K-Means , 2013 .

[4]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[5]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[6]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[7]  Dino Isa,et al.  An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization , 2011, Applied Intelligence.

[8]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[9]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  Cheng Zhang,et al.  Research on enhancing the effectiveness of the Chinese text automatic categorization based on ICTCLAS segmentation method , 2013, 2013 IEEE 4th International Conference on Software Engineering and Service Science.

[13]  Roman M. Balabin,et al.  Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. , 2011, Analytica chimica acta.

[14]  S. Sameen Fatima,et al.  Features Selection Method for Automatic Text Categorization: A Comparative Study with WEKA and RapidMiner Tools , 2014 .

[15]  Samuel W. K. Chan,et al.  A text-based decision support system for financial sequence prediction , 2011, Decis. Support Syst..

[16]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .