What's my age?: Predicting Twitter User's Age using Influential Friend Network and DBpedia

Social media is a rich source of user behavior and opinions. Twitter senses nearly 500 million tweets per day from 328 million users.An appropriate machine learning pipeline over this information enables up-to-date and cost-effective data collection for a wide variety of domains such as; social science, public health, the wisdom of the crowd, etc. In many of the domains, users demographic information is key to the identification of segments of the populations being studied. For instance, Which age groups are observed to abuse which drugs?, Which ethnicities are most affected by depression per location?. Twitter in its current state does not require users to provide any demographic information. We propose to create a machine learning system coupled with the DBpedia graph that predicts the most probable age of the Twitter user. In our process to build an age prediction model using social media text and user meta-data, we explore the existing state of the art approaches. Detailing our data collection, feature engineering cycle, model selection and evaluation pipeline, we will exhibit the efficacy of our approach by comparing with the "predict mean" age estimator baseline.

[1]  Jason M. Simmons,et al.  Understanding Professional Athletes' Use of Twitter: A Content Analysis of Athlete Tweets , 2010 .

[2]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[3]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[4]  Aron Culotta,et al.  Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data , 2016, J. Artif. Intell. Res..

[5]  W. M. Westenberg The influence of YouTubers on teenagers : a descriptive research about the role YouTubers play in the life of their teenage viewers , 2016 .

[6]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[7]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[8]  Dong Nguyen,et al.  Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment , 2014, COLING.

[9]  Daniel J. Ozer,et al.  Correlation and the coefficient of determination , 1985 .

[10]  Pablo N. Mendes,et al.  Twitris 2.0 : Semantically Empowered System for Understanding Perceptions From Social Data , 2010 .

[11]  J. Friedman Regularized Discriminant Analysis , 1989 .

[12]  Amit P. Sheth,et al.  Twitris: A System for Collective Social Intelligence , 2014, Encyclopedia of Social Network Analysis and Mining.

[13]  Mark Terry,et al.  Twittering healthcare: social media and medicine. , 2009, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[14]  Stan Matwin,et al.  Challenges in Computational Statistics and Data Mining , 2015, Challenges in Computational Statistics and Data Mining.

[15]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[16]  Clifton B. Kruse Jr. Esq. How Old Do You Think I Am , 2001 .

[17]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[18]  N. Ryder The cohort as a concept in the study of social change. , 1965, American sociological review.

[19]  Jessica T. Feezell,et al.  The Civic and Political Significance of Online Participatory Cultures among Youth Transitioning to Adulthood , 2013 .

[20]  Carolyn Penstein Rosé,et al.  Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[21]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[22]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Yun Fu,et al.  A Probabilistic Fusion Approach to human age prediction , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.