French presidential elections: what are the most efficient measures for tweets?

Tweets exchanged over the Internet are an important source of information even if their characteristics make them difficult to analyze (e.g., a maximum of 140 characters; noisy data). In this paper, we address the problem of extracting relevant topics through tweets coming from different communities. More precisely we are interested to address the following question: which are the most relevant terms given a community. To answer this question we define and evaluate new variants of the traditional TF-IDF. Furthermore we also show that our measures are well suited to recommend a community affiliation to a new user. Experiments have been conducted on tweets collected during French Presidential and Legislative elections in 2012. The results underline the quality and the usefulness of our proposal.

[1]  Pushpak Bhattacharyya,et al.  C-Feel-It: A Sentiment Analyzer for Micro-blogs , 2011, ACL.

[2]  François Yvon,et al.  Normalizing SMS: are Two Metaphors Better than One ? , 2008, COLING.

[3]  A-Rong Kwon,et al.  Extracting Social Events Based on Timeline and Sentiment Analysis in Twitter Corpus , 2012, NLDB.

[4]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[5]  Ruoming Jin,et al.  A Topic Modeling Approach and Its Integration into the Random Walk Framework for Academic Search , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[7]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[8]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[9]  Maguelonne Teisseire,et al.  Towards an On-Line Analysis of Tweets Processing , 2011, DEXA.

[10]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[11]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[12]  Jugal K. Kalita,et al.  Streaming trend detection in Twitter , 2013, Int. J. Web Based Communities.

[13]  Ana-Maria Popescu,et al.  Democrats, republicans and starbucks afficionados: user classification in twitter , 2011, KDD.

[14]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.