Mirroring the real world in social media: twitter, geolocation, and sentiment analysis

In recent years social media has been used to characterize and predict real world events, and in this research we seek to investigate how closely Twitter mirrors the real world. Specifically, we wish to characterize the relationship between the language used on Twitter and the results of the 2011 NBA Playoff games. We hypothesize that the language used by Twitter users will be useful in classifying the users' locations combined with the current status of which team is in the lead during the game. This is based on the common assumption that "fans" of a team have more positive sentiment and will accordingly use different language when their team is doing well. We investigate this hypothesis by labeling each tweet according the the location of the user along with the team that is in the lead at the time of the tweet. The hypothesized difference in language (as measured by tfidf) should then have predictive power over the tweet labels. We find that indeed it does and we experiment further by adding semantic orientation (SO) information as part of the feature set. The SO does not offer much improvement over tf-idf alone. We discuss the relative strengths of the two types of features for our data.

[1]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[2]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[3]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[4]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[5]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[6]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[7]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Lipika Dey,et al.  Opinion mining from noisy text data , 2008, AND '08.

[10]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[11]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[12]  Xiaozhong Liu,et al.  Real-time user interest modeling for real-time ranking , 2013, J. Assoc. Inf. Sci. Technol..

[13]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[14]  Ramesh C. Jain,et al.  Situation detection and control using spatio-temporal analysis of microblogs , 2010, WWW '10.

[15]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[17]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.