We introduce a novel approach for automatically classifying the sentiment of Twitter messages. These messages are classified as either positive or negative with respect to a query term. This is useful for consumers who want to research the sentiment of products before purchase, or companies that want to monitor the public sentiment of their brands. There is no previous research on classifying sentiment of messages on microblogging services like Twitter. We present the results of machine learning algorithms for classifying the sentiment of Twitter messages using distant supervision. Our training data consists of Twitter messages with emoticons, which are used as noisy labels. This type of training data is abundantly available and can be obtained through automated means. We show that machine learning algorithms (Naive Bayes, Maximum Entropy, and SVM) have accuracy above 80% when trained with emoticon data. This paper also describes the preprocessing steps needed in order to achieve high accuracy. The main contribution of this paper is the idea of using tweets with emoticons for distant supervised learning.
[1]
Janyce Wiebe,et al.
Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis
,
2005,
HLT.
[2]
Thorsten Joachims,et al.
Making large-scale support vector machine learning practical
,
1999
.
[3]
Bo Pang,et al.
Thumbs up? Sentiment Classification using Machine Learning Techniques
,
2002,
EMNLP.
[4]
Nello Cristianini,et al.
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
,
2000
.
[5]
Lillian Lee,et al.
Opinion Mining and Sentiment Analysis
,
2008,
Found. Trends Inf. Retr..
[6]
Bernard J. Jansen,et al.
Micro-blogging as online word of mouth branding
,
2009,
CHI Extended Abstracts.
[7]
Andrew McCallum,et al.
Using Maximum Entropy for Text Classification
,
1999
.
[8]
G. Mishne.
Experiments with Mood Classification in
,
2005
.
[9]
Jonathon Read,et al.
Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification
,
2005,
ACL.