Sentiment Analysis by Augmenting Expectation Maximisation with Lexical Knowledge

Sentiment analysis of documents aims to characterise the positive or negative sentiment expressed in documents. It has been formulated as a supervised classification problem, which requires large numbers of labelled documents. Semi-supervised sentiment classification using limited documents or words labelled with sentiment-polarities are approaches to reducing labelling cost for effective learning. Expectation Maximisation (EM) has been widely used in semi-supervised sentiment classification. A prominent problem with existing EM-based approaches is that the objective function of EM may not conform to the intended classification task and thus can result in poor classification performance. In this paper we propose to augment EM with the lexical knowledge of opinion words to mitigate this problem. Extensive experiments on diverse domains show that our lexical EM algorithm achieves significantly higher accuracy than existing standard EM-based semi-supervised learning approaches for sentiment classification, and also significantly outperforms alternative approaches using the lexical knowledge.

[1]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[2]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Andrew McCallum,et al.  High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models , 2010, ICML.

[5]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[6]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[7]  Craig MacDonald,et al.  Overview of the TREC 2007 Blog Track , 2007, TREC.

[8]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[9]  Ben Taskar,et al.  Expectation Maximization and Posterior Constraints , 2007, NIPS.

[10]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[11]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[12]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[13]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[14]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[15]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[16]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[17]  Wiltrud Kessler Turney: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classication of Reviews , 2012 .

[18]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[19]  Fabio Gagliardi Cozman,et al.  Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers , 2006, Semi-Supervised Learning.

[20]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[21]  Vikas Sindhwani,et al.  Document-Word Co-regularization for Semi-supervised Sentiment Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.