Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared relatively recently, there are a few research works that were devoted to this topic. In our paper, we focus on using Twitter, the most popular microblogging platform, for the task of sentiment analysis. We show how to automatically collect a corpus for sentiment analysis and opinion mining purposes. We perform linguistic analysis of the collected corpus and explain discovered phenomena. Using the corpus, we build a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document. Experimental evaluations show that our proposed techniques are efficient and performs better than previously proposed methods. In our research, we worked with English, however, the proposed technique can be used with any other language.

[1]  Claude E. Shannon,et al.  The Mathematical Theory of Communication , 1950 .

[2]  Raymond H. Myers,et al.  Probability and Statistics for Engineers and Scientists. , 1973 .

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Anthony J. Hayter,et al.  Probability and statistics for engineers and scientists , 1996 .

[5]  Patrick Paroubek,et al.  The GRACE french part-of-speech tagging evaluation task , 1998, LREC.

[6]  Ted Pedersen,et al.  A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[9]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[10]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[11]  Ethem Alpaydin,et al.  Introduction to Machine Learning (Adaptive Computation and Machine Learning) , 2004 .

[12]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[13]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[14]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[15]  Hsin-Hsi Chen,et al.  Emotion Classification Using Web Blog Corpora , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[16]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[17]  Bernard J. Jansen,et al.  Micro-blogging as online word of mouth branding , 2009, CHI Extended Abstracts.