Religious Politicians and Creative Photographers: Automatic User Categorization in Twitter

Finding the ''right people'' is a central aspect of social media systems. Twitter has millions of users who have varied interests, professions and personalities. For those in fields such as advertising and marketing, it is important to identify certain characteristics of users to target. However, Twitter users do not generally provide sufficient information about themselves on their profile which makes this task difficult. In response, this work sets out to automatically infer professions (e.g., musicians, health sector workers, technicians) and personality related attributes (e.g., creative, innovative, funny) for Twitter users based on features extracted from their content, their interaction networks, attributes of their friends and their activity patterns. We develop a comprehensive set of latent features that are then employed to perform efficient classification of users along these two dimensions (profession and personality). Our experiments on a large sample of Twitter users demonstrate both a high overall accuracy in detecting profession and personality related attributes as well as highlighting the benefits and pitfalls of various types of features for particular categories of users.

[1]  Eric Gilbert,et al.  Predicting tie strength with social media , 2009, CHI.

[2]  Sebastian Funk,et al.  Word usage mirrors community structure in the online social network Twitter , 2013, EPJ Data Science.

[3]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  A. Stirling A general framework for analysing diversity in science, technology and society , 2007, Journal of The Royal Society Interface.

[6]  Ana-Maria Popescu,et al.  Democrats, republicans and starbucks afficionados: user classification in twitter , 2011, KDD.

[7]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[8]  P. Pirolli,et al.  It's Not in Their Tweets: Modeling Topical Expertise of Twitter Users , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[9]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[10]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[11]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Hanna Wallach,et al.  Structured Topic Models for Language , 2008 .

[14]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[15]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[16]  Daniele Quercia,et al.  Our Twitter Profiles, Our Selves: Predicting Personality with Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[17]  Mark Batey,et al.  A tale of two sites: Twitter vs. Facebook and the personality predictors of social media usage , 2012, Comput. Hum. Behav..

[18]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[19]  Jennifer Golbeck,et al.  Predicting Personality from Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[20]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.