论文信息 - Statistical Natural Language Processing Final Project Sentiment Classification in Twitter : A Comparison between Domain Adaptation and Distant Supervision

Statistical Natural Language Processing Final Project Sentiment Classification in Twitter : A Comparison between Domain Adaptation and Distant Supervision

In this paper we study empirically the accuracy of various NLP methods to classify Twitter sentiment. We first try the more traditional approach of using labeled movie reviews as a training set and then attempt a less conventional technique of using emoticons in Tweets as noisy labels of the sentiment. We compare the advantages and disadvantages of each approach and determine whether certain modifications to the pre-processing of reviews or Tweets has any significant improvement on accuracy. The methods we implement are Bag of Words, Maximum Entropy, Perceptron, Averaged Perceptron and Bootstrapping our training set. At the end of the paper we provide a summary of the best and worst performing methods as well as discussion of the techniques used.

Michael Khanarian | David Álvarez-Melis

[1] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[2] Brendan T. O'Connor,et al. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[3] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[4] Jonathon Read,et al. Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[5] Ari Rappoport,et al. Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[6] Patrick Paroubek,et al. Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.