Robust Sentiment Detection on Twitter from Biased and Noisy Data

In this paper, we propose an approach to automatically detect sentiments on Twitter messages (tweets) that explores some characteristics of how tweets are written and meta-information of the words that compose these messages. Moreover, we leverage sources of noisy labels as our training data. These noisy labels were provided by a few sentiment detection websites over twitter data. In our experiments, we show that since our features are able to capture a more abstract representation of tweets, our solution is more effective than previous ones and also more robust regarding biased and noisy data, which is the kind of data provided by these sources.

[1]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[2]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[3]  Ian Witten,et al.  Data Mining , 2000 .

[4]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[5]  E. Huffman,et al.  Social Media's New Role in Emergency Management , 2008 .

[6]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[7]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[8]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[9]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[10]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[11]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[12]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[13]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[14]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[15]  Matthew Hurst,et al.  Deriving marketing intelligence from online discussion , 2005, KDD '05.