Evaluating Feature Sets and Classifiers for Sentiment Analysis of Financial News

Work on sentiment analysis has thus far been limited in the news article domain. This has mainly been caused by 1) news articles lacking a clearly defined target, 2) the difficulty in separating good and bad news from positive and negative sentiment, and 3) the seeming necessity of, and complexity in, relying on domain-specific interpretations and background knowledge. In this paper we propose, define, experiment with, and evaluate, four different feature categories, composed of 26 article features, for sentiment analysis. Using five different machine learning methods, we train sentiment classifiers of Norwegian financial internet news articles, and achieve classification precisions up to ~71%. This is comparable to the state-of-the-art in other domains and close to the human baseline. Our experimentation with different feature subsets shows that the category relying on domain-specific sentiment lexical ('contextual' category), able to grasp the jargon and lingo used in Norwegian financial news, is of cardinal importance in classification - these features yield a precision increase of ~21% when added to the other feature categories. When comparing different machine learning classifiers, we find J48 classification trees to yield the highest performance, closely followed by Random Forests (RF), in line with recent studies, and in opposition to the antedated conception that Support Vector Machines (SVM) is superior in this domain.

[1]  Kimberly A. Neuendorf,et al.  Reliability for Content Analysis , 2010 .

[2]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[3]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[4]  Hua Xu,et al.  Feature Subsumption for Sentiment Classification in Multiple Languages , 2010, PAKDD.

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[7]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[8]  ThelwallMike,et al.  Sentiment strength detection in short informal text , 2010 .

[9]  Yufei Tao,et al.  Finding frequent co-occurring terms in relational keyword search , 2009, EDBT '09.

[10]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[11]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[12]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[13]  I. Maqsood,et al.  Random Forests and Decision Trees , 2012 .

[14]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[15]  Hsinchun Chen,et al.  Selecting Attributes for Sentiment Classification Using Feature Relation Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[16]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[17]  Roger G. Stone,et al.  Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages , 2009 .

[18]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[21]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[22]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[23]  Frode Sættem,et al.  Causal relations among stock returns and macroeconomic variables in a small, open economy , 1999 .

[24]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[25]  Shubhamoy Dey,et al.  A document-level sentiment analysis approach using artificial neural network and sentiment lexicons , 2012, SIAP.

[26]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[27]  Mark Lee A CCG-based System for Valence Shifting for Sentiment Analysis , 2009 .