Probabilistic Inference of Twitter Users' Age Based on What They Follow

Twitter provides an open and rich source of data for studying human behaviour at scale and is widely used in social and network sciences. However, a major criticism of Twitter data is that demographic information is largely absent. Enhancing Twitter data with user ages would advance our ability to study social network structures, information flows and the spread of contagions. Approaches toward age detection of Twitter users typically focus on specific properties of tweets, e.g., linguistic features, which are language dependent. In this paper, we devise a language-independent methodology for determining the age of Twitter users from data that is native to the Twitter ecosystem. The key idea is to use a Bayesian framework to generalise ground-truth age information from a few Twitter users to the entire network based on what/whom they follow. Our approach scales to inferring the age of 700 million Twitter accounts with high accuracy.

[1]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[2]  Clifton B. Kruse Jr. Esq. How Old Do You Think I Am , 2001 .

[3]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Yun Fu,et al.  Image-Based Human Age Estimation by Manifold Learning and Locally Adjusted Robust Regression , 2008, IEEE Transactions on Image Processing.

[6]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[7]  Yun Fu,et al.  Age Synthesis and Estimation via Faces: A Survey , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[9]  Lars Backstrom,et al.  ePluribus: Ethnicity on Social Networks , 2010, ICWSM.

[10]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[11]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[12]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[13]  Carolyn Penstein Rosé,et al.  Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[14]  Wojciech Zabierowski,et al.  Twisted framework on game server example , 2011, 2011 11th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM).

[15]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[16]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[17]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[18]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[19]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[20]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[21]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[22]  A. Culotta,et al.  Using County Demographics to Infer Attributes of Twitter Users , 2014 .

[23]  Hüseyin Oktay,et al.  Demographic Breakdown of Twitter Users: An analysis based on names , 2014 .

[24]  Trey Grainger,et al.  Solr in Action , 2014 .

[25]  Dong Nguyen,et al.  Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment , 2014, COLING.

[26]  M. Shamim Hossain,et al.  Relational User Attribute Inference in Social Media , 2015, IEEE Transactions on Multimedia.

[27]  Aron Culotta,et al.  Predicting the Demographics of Twitter Users from Website Traffic Data , 2015, AAAI.

[28]  Jawed Karim YouTube , 2019, Social Media.