Feature Selection and Weighting Methods in Sentiment Analysis

Sentiment analysis is the task of identifying whether the opinion expressed in a document is positive or negative about a given topic. Unfortunately, many of the potential applications of sentiment analysis are currently infeasible due to the huge number of features found in standard corpora. In this paper we systematically evaluate a range of feature selectors and feature weights with both Naı̈ve Bayes and Support Vector Machine classifiers. This includes the introduction of two new feature selection methods and three new feature weighting methods. Our results show that it is possible to maintain a state-of-the art classification accuracy of 87.15% while using less than 36% of the features.

[1]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[2]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[3]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[4]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[5]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[6]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[7]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[8]  Eric Brill,et al.  Reducing the human overhead in text categorization , 2006, KDD '06.

[9]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[10]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[11]  Robert J. Hilderman,et al.  Categorical Proportional Difference: A Feature Selection Method for Text Categorization , 2008, AusDM.

[12]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[13]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.