论文信息 - Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons - 字舞流文

Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons

This article introduces a new general-purpose sentiment lexicon called WKWSCI Sentiment Lexicon and compares it with five existing lexicons: Hu & Liu Opinion Lexicon, Multi-perspective Question Answering (MPQA) Subjectivity Lexicon, General Inquirer, National Research Council Canada (NRC) Word-Sentiment Association Lexicon and Semantic Orientation Calculator (SO-CAL) lexicon. The effectiveness of the sentiment lexicons for sentiment categorisation at the document level and sentence level was evaluated using an Amazon product review data set and a news headlines data set. WKWSCI, MPQA, Hu & Liu and SO-CAL lexicons are equally good for product review sentiment categorisation, obtaining accuracy rates of 75%–77% when appropriate weights are used for different categories of sentiment words. However, when a training corpus is not available, Hu & Liu obtained the best accuracy with a simple-minded approach of counting positive and negative words for both document-level and sentence-level sentiment categorisation. The WKWSCI lexicon obtained the best accuracy of 69% on the news headlines sentiment categorisation task, and the sentiment strength values obtained a Pearson correlation of 0.57 with human-assigned sentiment values. It is recommended that the Hu & Liu lexicon be used for product review texts and the WKWSCI lexicon for non-review texts.

Christopher S. G. Khoo | Sathik Basha Johnkhan

[1] Zheng Lin,et al. Towards jointly extracting aspects and aspect-specific sentiment knowledge , 2012, CIKM.

[2] Christopher S. G. Khoo,et al. Sentiment analysis of online news text: a case study of appraisal theory , 2012, Online Inf. Rev..

[3] M. de Rijke,et al. UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[4] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[5] Michael L. Littman,et al. Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[6] P. Witty. The teacher's word book of 30,000 words. , 1945 .

[7] Christopher D. Manning,et al. Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[8] E. Thorndike. The Teacher's Word Book , 2007 .

[9] Harry Zhang,et al. The Optimality of Naive Bayes , 2004, FLAIRS.

[10] Christopher S. G. Khoo,et al. Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[11] Jan Šnajder,et al. Experiments on Hybrid Corpus-Based Sentiment Lexicon Acquisition , 2012 .

[12] Ellen Riloff,et al. Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[13] Saif Mohammad,et al. CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[14] Vasileios Hatzivassiloglou,et al. Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[15] Carlo Strapparava,et al. WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[16] Carlo Strapparava,et al. SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[17] Christopher S. G. Khoo,et al. Use of negation phrases in automatic sentiment classification of product reviews , 2005 .

[18] Claire Cardie,et al. Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[19] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[20] Boi Faltings,et al. Sentiment Analysis Using a Novel Human Computation Game , 2012, PWNLP@ACL.

[21] Mike Thelwall,et al. Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[22] Edward L. Thorndike,et al. The Teacher's Word Book of 30, 000 Words , 2018 .

[23] Marshall S. Smith,et al. The general inquirer: A computer approach to content analysis. , 1967 .

[24] Khurshid Ahmad,et al. Is there a language of sentiment? An analysis of lexical resources for sentiment analysis , 2013, Language Resources and Evaluation.

[25] Mike Thelwall,et al. Topic-based sentiment analysis for the social web: The role of mood and issue-related words , 2013, J. Assoc. Inf. Sci. Technol..

[26] Haewoon Kwak,et al. Tower of babel: a crowdsourcing game building sentiment lexicons for resource-scarce languages , 2013, WWW.

[27] Bing Liu,et al. Opinion spam and analysis , 2008, WSDM '08.

[28] Qiang Yang,et al. Cross-Domain Co-Extraction of Sentiment and Topic Lexicons , 2012, ACL.

[29] Marcus Liwicki,et al. Fuzzy Subjective Sentiment Phrases: A Context Sensitive and Self-Maintaining Sentiment Lexicon , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[30] Andrea Esuli,et al. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[31] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[32] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[33] Nazli Goharian,et al. Semi-supervised probabilistic sentiment analysis: Merging labeled sentences with unlabeled reviews to identify sentiment , 2013, ASIST.

[34] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35] Maite Taboada,et al. Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[36] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.

[37] Christopher S. G. Khoo,et al. Evaluation of a General-Purpose Sentiment Lexicon on a Product Review Corpus , 2015, ICADL.

[38] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[39] Ewan Klein,et al. Natural Language Processing with Python , 2009 .

[40] Amitava Das,et al. Sentimantics: Conceptual Spaces for Lexical Sentiment Polarity Representation with Contextuality , 2012, WASSA@ACL.

[41] Andrea Esuli,et al. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.