User Type Classification of Tweets with Implications for Event Recognition

Twitter has become one of the foremost platforms for information sharing. Consequently, it is beneficial for the consumers of Twitter to know the origin of a tweet, as it affects how they view and interpret this information. In this paper, we classify tweets based on their origin, exploiting only the textual content of tweets . Specifically, using a rich, linguistic feature set and a supervised classifier framework, we classify tweets into two user types - organizations and individual persons. Our user type classifier achieves an 89% F1-score for identifying tweets that originate from organizations in English and an 87% F1-score for Spanish. We also demonstrate that classifying the user type of a tweet can improve downstream event recognition tasks. We analyze several schemes that exploit user type information to enhance Twitter event recognition and show that substantial improvements can be achieved by training separate models for different user types.

[1]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[2]  Dan Klein,et al.  An Empirical Investigation of Statistical Significance in NLP , 2012, EMNLP.

[3]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[4]  Eduard H. Hovy,et al.  Structured Event Retrieval over Microblog Archives , 2012, NAACL.

[5]  Hila Becker,et al.  Event Identification in Social Media , 2009, WebDB.

[6]  Hanan Samet,et al.  Identification of live news events using Twitter , 2011, LBSN '11.

[7]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[8]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[9]  Clayton Fink,et al.  Inferring Gender from the Content of Tweets: A Region Specific Example , 2012, ICWSM.

[10]  Mor Naaman,et al.  Unfolding the event landscape on twitter: classification and exploration of user categories , 2012, CSCW '12.

[11]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[12]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[13]  Hila Becker,et al.  Learning similarity metrics for event identification in social media , 2010, WSDM '10.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[16]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[17]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[18]  Peter Christen,et al.  Event Diffusion Patterns in Social Media , 2012, ICWSM.

[19]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[20]  Duncan J. Watts,et al.  Who says what to whom on twitter , 2011, WWW.

[21]  David Yarowsky,et al.  Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media , 2013, EMNLP.

[22]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[23]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[24]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[25]  Marcus Messner,et al.  Shoveling tweets: An analysis of the microblogging engagement of traditional news organizations , 2011 .

[26]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[27]  Regina Barzilay,et al.  Event Discovery in Social Media Feeds , 2011, ACL.

[28]  Ana-Maria Popescu,et al.  "Dancing with the Stars, " NBA Games, Politics: An Exploration of Twitter Users' Response to Events , 2011, ICWSM.

[29]  Mor Naaman,et al.  Is it really about me?: message content in social awareness streams , 2010, CSCW '10.

[30]  Hila Becker,et al.  Selecting Quality Twitter Content for Events , 2011, ICWSM.