Sentiment Classification of Financial News Using Statistical Features

Sentiment classification of financial news deals with the identification of positive and negative news so that they can be applied in decision support systems for stock trend predictions. This paper explores several types of feature spaces as different data spaces for sentiment classification of the news article. Experiments are conducted using N-gram models unigram, bigram and the combination of unigram and bigram as feature extraction with traditional feature weighting methods (binary, term frequency (TF), and term frequency-document frequency (TF-IDF)), while document frequency (DF) was used in order to generate feature spaces with different dimensions to evaluate N-gram models and traditional feature weighting methods. We performed some experiments to measure the classification accuracy of support vector machine (SVM) with two kernel methods of Linear and Gaussian radial basis function (RBF). We concluded that feature selection and feature weighting methods can have a substantial role in sentiment classification. Furthermore, the results showed that the proposed work which combined unigram and bigram along with TF-IDF feature weighting method and optimized RBF kernel SVM produced high classification accuracy in financial news classification.

[1]  Boris V. Dobrov,et al.  Support Vector Machine Parameter Optimization for Text Categorization Problems , 2003, ISTA.

[2]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[3]  M.V. Joshi,et al.  On evaluating performance of classifiers for rare classes , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  Ying Wah Teh,et al.  Text mining for market prediction: A systematic review , 2014, Expert Syst. Appl..

[5]  Vladimir Pestov,et al.  Is the kk-NN classifier in high dimensions affected by the curse of dimensionality? , 2011, Comput. Math. Appl..

[6]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[7]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[8]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[9]  Santanu Kumar Rath,et al.  Classification of sentiment reviews using n-gram machine learning approach , 2016, Expert Syst. Appl..

[10]  Khaled S. Ahmed,et al.  Estimating Protein Functions Correlation Based on Overlapping Proteins and Cluster Interactions , 2012 .

[11]  Moshe Koppel,et al.  Good News or Bad News? Let the Market Decide , 2006, Computing Attitude and Affect in Text.

[12]  Colm Kearney,et al.  Textual Sentiment in Finance: A Survey of Methods and Models , 2013 .

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Shiliang Sun,et al.  An empirical evaluation of linear and nonlinear kernels for text classification using Support Vector Machines , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[15]  Yashwant Prasad Singh,et al.  ONE-CLASS SUPPORT VECTOR MACHINES APPROACH TO ANOMALY DETECTION , 2013, Appl. Artif. Intell..

[16]  Chenchuramaiah T. Bathala Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2007 .

[17]  M. N. Sulaiman,et al.  A Review On Evaluation Metrics For Data Classification Evaluations , 2015 .

[18]  Jianchu Kang,et al.  A comparative study on unsupervised feature selection methods for text clustering , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[19]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[20]  Pramod Kumar Singh,et al.  Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering , 2015, Expert Syst. Appl..

[21]  Hieu Le Quang,et al.  A New Improved Term Weighting Scheme for Text Categorization , 2013, KSE.

[22]  Santanu Kumar Rath,et al.  Classification of Sentimental Reviews Using Machine Learning Techniques , 2015 .

[23]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[24]  Steven Skiena,et al.  Trading Strategies to Exploit Blog and News Sentiment , 2010, ICWSM.

[25]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .

[26]  Michel Généreux,et al.  Sentiment Analysis Using Automatically Labelled Financial News Items , 2011 .

[27]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[28]  Lixin Han,et al.  Categorical Document Frequency Based Feature Selection for Text Categorization , 2011, 2011 International Conference of Information Technology, Computer Engineering and Management Sciences.

[29]  HejaziMaryamsadat,et al.  ONE-CLASS SUPPORT VECTOR MACHINES APPROACH TO ANOMALY DETECTION , 2013 .

[30]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[32]  Hsuan-Tien Lin A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods , 2005 .

[33]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[34]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.