Use of negation phrases in automatic sentiment classification of product reviews

This paper reports a study in automatic sentiment classification, i.e., automatically classifying documents as expressing positive or negative sentiments. The study investigates the effectiveness of using a machine-learning algorithm, support vector machine (SVM), on various text features to classify on-line product reviews into recommended (positive sentiment) and not recommended (negative sentiment). In the first part of this study, several approaches, unigrams (individual words), selected words (such as verb, adjective, and adverb), and words labeled with part-of-speech tags were investigated. Using SVM, the unigram approach obtained an accuracy rate of around 76%. Error analysis suggests various approaches for improving classification accuracy: handling of negation phrases, inferencing from superficial words, and handling the problem of comments on parts of the product. The second part of the study investigated the use of negation phrase n-grams to improve classification accuracy. This approach increased the accuracy rate to 79.33%. Compared with traditional subject classification which mainly uses unigrams, syntactic and semantic processing of text appear more important for sentiment classification. We expect that deeper linguistic processing will help increase accuracy for sentiment classification.

[1]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[2]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[3]  Barry Smyth,et al.  Genre Classification and Domain Transfer for Information Filtering , 2002, ECIR.

[4]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[5]  W. Downes The language of felt experience: emotional, evaluative and intuitive , 2000 .

[6]  J. R. Quinlan Constructing Decision Trees , 1993 .

[7]  Sung-Hyon Myaeng,et al.  Text genre classification with genre-revealing and subject-revealing features , 2002, SIGIR '02.

[8]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[9]  Rebecca Ann Lind,et al.  The Framing of Feminists and Feminism in News and Public Affairs Programs in U.S. Electronic Media , 2002 .

[10]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[11]  P. Valkenburg,et al.  Framing European politics: a content analysis of press and television news , 2000 .

[12]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[13]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[14]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[15]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[16]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[17]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[18]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[19]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[20]  Ellen Spertus,et al.  Smokey: Automatic Recognition of Hostile Messages , 1997, AAAI/IAAI.

[21]  Wolfgang Wahlster,et al.  Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics , 1997 .

[22]  Marti A. Hearst Direction-based text interpretation as an information access refinement , 1992 .

[23]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[24]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .