Evaluation of a General-Purpose Sentiment Lexicon on a Product Review Corpus

This paper introduces a new general-purpose sentiment lexicon called the WKWSCI Sentiment Lexicon and compares it with three existing lexicons. The WKWSCI Sentiment Lexicon is based on the 6of12dict lexicon, and currently covers adjectives, adverbs and verbs. The words were manually coded with a value on a 7-point sentiment strength scale. The effectiveness of the four sentiment lexicons for sentiment categorization at the document-level and sentence-level was evaluated using an Amazon product review dataset. The WKWSCI lexicon obtained the best results for document-level sentiment categorization, with an accuracy of 75%. The Hu & Liu lexicon obtained the best results for sentence-level sentiment categorization, with an accuracy of 77%. The best bag-of-words machine learning model obtained an accuracy of 82% for document-level sentiment categorization model. The strength of the lexiconbased method is in sentence-level and aspect-based sentiment analysis, where it is difficult to apply machine-learning because of the small number of features.

[1]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[2]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[3]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[4]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[7]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[10]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[11]  Lora Aroyo,et al.  Crowdsourcing in the cultural heritage domain: opportunities and challenges , 2011, C&T.

[12]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[13]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[14]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[17]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[18]  Christopher S. G. Khoo,et al.  Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[19]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[20]  Christopher S. G. Khoo,et al.  Sentiment analysis of online news text: a case study of appraisal theory , 2012, Online Inf. Rev..