Inferring gender of a Twitter user using celebrities it follows

This paper addresses the task of user gender classification in social media, with an application to Twitter. The approach automatically predicts gender by leveraging observable information such as the tweet behavior, linguistic content of the user's Twitter feed and the celebrities followed by the user. This paper first evaluates linguistic content based features using LIWC dictionary and popular neighborhood features using Wikipedia and Freebase. Then augments both features which yielded a significant increase in the accuracy for gender prediction. Results show that rich linguistic features combined with popular neighborhood prove valuables and promising for additional user classification needs.

[1]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[2]  David Bamman,et al.  Gender identity and lexical variation in social media , 2012, 1210.4567.

[3]  Vijil Chenthamarakshan,et al.  Amplifying the voice of youth in Africa via text analytics , 2013, KDD.

[4]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[5]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[6]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[7]  Загоровская Ольга Владимировна,et al.  Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count , 2015 .

[8]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[9]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[10]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[11]  Carolyn Penstein Rosé,et al.  Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[12]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[13]  Kwan Hui Lim,et al.  Finding twitter communities with common interests using following links of celebrities , 2012, MSM '12.

[14]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.