Predicting Twitter User Socioeconomic Attributes with Network and Language Information

Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users' social networks has not yet been explored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary.

[1]  Ingemar J. Cox,et al.  Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language , 2016, ECIR.

[2]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[3]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[4]  Morroe Berger,et al.  Freedom and control in modern society , 1954 .

[5]  A. Pentland,et al.  Computational Social Science , 2009, Science.

[6]  Renato Miranda,et al.  Inferring User Social Class in Online Social Networks , 2014, SNAKDD'14.

[7]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[8]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[9]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[10]  Karen E. Campbell,et al.  SOCIAL RESOURCES AND SOCIOECONOMIC STATUS , 1986 .

[11]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[15]  Eduard H. Hovy,et al.  Weakly Supervised User Profile Extraction from Twitter , 2014, ACL.

[16]  W. Labov The social stratification of English in New York City , 1969 .

[17]  Marc Peter Deisenroth,et al.  Probabilistic Inference of Twitter Users' Age Based on What They Follow , 2016, ECML/PKDD.

[18]  Asif Ekbal,et al.  Temporal Orientation of Tweets for Predicting Income of Users , 2017, ACL.

[19]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[20]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[21]  Steven Skiena,et al.  Exact Age Prediction in Social Networks , 2015, WWW.

[22]  Alan Mislove,et al.  The Tweets They Are a-Changin: Evolution of Twitter Users and Behavior , 2014, ICWSM.

[23]  E. LESTER SMITH,et al.  AND OTHERS , 2005 .

[24]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[25]  Nikolaos Aletras,et al.  An analysis of the user occupational class through Twitter content , 2015, ACL.

[26]  Daniele Quercia,et al.  Our Twitter Profiles, Our Selves: Predicting Personality with Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[27]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[28]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[29]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[30]  Yanxiang Huang,et al.  A multi-source integration framework for user occupation inference in social media systems , 2015, World Wide Web.

[31]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[32]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[33]  P. Lazarsfeld,et al.  Friendship as Social process: a substantive and methodological analysis , 1964 .

[34]  Bernstein Basil Class, codes and control.vol.2, applied studies towards a sociology of language , 2017 .

[35]  Nicu Sebe,et al.  Friends don't lie: inferring personality traits from social network structure , 2012, UbiComp.

[36]  Long Jiang,et al.  User-level sentiment analysis incorporating social networks , 2011, KDD.

[37]  Yoram Bachrach,et al.  Studying User Income through Language, Behaviour and Affect in Social Media , 2015, PloS one.

[38]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[39]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[40]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[41]  B. Bernstein Language and Social Class , 1960 .

[42]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[43]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[44]  Mark Dredze,et al.  Geolocation for Twitter: Timing Matters , 2016, NAACL.

[45]  Svitlana Volkova,et al.  On Predicting Sociodemographic Traits and Emotions from Communications in Social Networks and Their Implications to Online Self-Disclosure , 2015, Cyberpsychology Behav. Soc. Netw..

[46]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[47]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[48]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[49]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[50]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[51]  J. Nadal,et al.  Manifesto of computational social science , 2012 .