Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons

This article introduces a new general-purpose sentiment lexicon called WKWSCI Sentiment Lexicon and compares it with five existing lexicons: Hu & Liu Opinion Lexicon, Multi-perspective Question Answering (MPQA) Subjectivity Lexicon, General Inquirer, National Research Council Canada (NRC) Word-Sentiment Association Lexicon and Semantic Orientation Calculator (SO-CAL) lexicon. The effectiveness of the sentiment lexicons for sentiment categorisation at the document level and sentence level was evaluated using an Amazon product review data set and a news headlines data set. WKWSCI, MPQA, Hu & Liu and SO-CAL lexicons are equally good for product review sentiment categorisation, obtaining accuracy rates of 75%–77% when appropriate weights are used for different categories of sentiment words. However, when a training corpus is not available, Hu & Liu obtained the best accuracy with a simple-minded approach of counting positive and negative words for both document-level and sentence-level sentiment categorisation. The WKWSCI lexicon obtained the best accuracy of 69% on the news headlines sentiment categorisation task, and the sentiment strength values obtained a Pearson correlation of 0.57 with human-assigned sentiment values. It is recommended that the Hu & Liu lexicon be used for product review texts and the WKWSCI lexicon for non-review texts.

[1]  Zheng Lin,et al.  Towards jointly extracting aspects and aspect-specific sentiment knowledge , 2012, CIKM.

[2]  Christopher S. G. Khoo,et al.  Sentiment analysis of online news text: a case study of appraisal theory , 2012, Online Inf. Rev..

[3]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[6]  P. Witty The teacher's word book of 30,000 words. , 1945 .

[7]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[8]  E. Thorndike The Teacher's Word Book , 2007 .

[9]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[10]  Christopher S. G. Khoo,et al.  Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[11]  Jan Šnajder,et al.  Experiments on Hybrid Corpus-Based Sentiment Lexicon Acquisition , 2012 .

[12]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[13]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[14]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[15]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[16]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[17]  Christopher S. G. Khoo,et al.  Use of negation phrases in automatic sentiment classification of product reviews , 2005 .

[18]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Boi Faltings,et al.  Sentiment Analysis Using a Novel Human Computation Game , 2012, PWNLP@ACL.

[21]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[22]  Edward L. Thorndike,et al.  The Teacher's Word Book of 30, 000 Words , 2018 .

[23]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[24]  Khurshid Ahmad,et al.  Is there a language of sentiment? An analysis of lexical resources for sentiment analysis , 2013, Language Resources and Evaluation.

[25]  Mike Thelwall,et al.  Topic-based sentiment analysis for the social web: The role of mood and issue-related words , 2013, J. Assoc. Inf. Sci. Technol..

[26]  Haewoon Kwak,et al.  Tower of babel: a crowdsourcing game building sentiment lexicons for resource-scarce languages , 2013, WWW.

[27]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[28]  Qiang Yang,et al.  Cross-Domain Co-Extraction of Sentiment and Topic Lexicons , 2012, ACL.

[29]  Marcus Liwicki,et al.  Fuzzy Subjective Sentiment Phrases: A Context Sensitive and Self-Maintaining Sentiment Lexicon , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[30]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[31]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[32]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[33]  Nazli Goharian,et al.  Semi-supervised probabilistic sentiment analysis: Merging labeled sentences with unlabeled reviews to identify sentiment , 2013, ASIST.

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[36]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[37]  Christopher S. G. Khoo,et al.  Evaluation of a General-Purpose Sentiment Lexicon on a Product Review Corpus , 2015, ICADL.

[38]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[39]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[40]  Amitava Das,et al.  Sentimantics: Conceptual Spaces for Lexical Sentiment Polarity Representation with Contextuality , 2012, WASSA@ACL.

[41]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.