#WhoAmI in 160 Characters? Classifying Social Identities Based on Twitter Profile Descriptions

We combine social theory and NLP methods to classify English-speaking Twitter users’ online social identity in profile descriptions. We conduct two text classification experiments. In Experiment 1 we use a 5-category online social identity classification based on identity and self-categorization theories. While we are able to automatically classify two identity categories (Relational and Occupational), automatic classification of the other three identities (Political, Ethnic/religious and Stigmatized) is challenging. In Experiment 2 we test a merger of such identities based on theoretical arguments. We find that by combining these identities we can improve the predictive performance of the classifiers in the experiment. Our study shows how social theory can be used to guide NLP methods, and how such methods provide input to revisit traditional social theory that is strongly consolidated in offline settings.

[1]  Patrizia Milesi,et al.  The Influence of the Internet on the Psychosocial Predictors of Collective Action , 2013 .

[2]  Blake E. Ashforth,et al.  Identification in Organizations: An Examination of Four Fundamental Questions , 2008 .

[3]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[4]  Djoerd Hiemstra,et al.  Determine the User Country of a Tweet , 2015, ArXiv.

[5]  Blake E. Ashforth,et al.  “I Identify with Her,” “I Identify with Him”: Unpacking the Dynamics of Personal Identification in Organizations , 2016 .

[6]  Bert Klandermans,et al.  Identity Processes in Collective Action Participation: Farmers' Identity and Farmers' Protest in the Netherlands and Spain , 2002 .

[7]  S. Stryker,et al.  Self, identity, and social movements , 2001 .

[8]  M. Olson,et al.  The Logic of Collective Action: Public Goods and the Theory of Groups , 1969 .

[9]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[10]  Michael Chan,et al.  Social identity gratifications of social network sites and their impact on collective action participation , 2014 .

[11]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[12]  B. Simon,et al.  Politicized collective identity. A social psychological analysis. , 2001, The American psychologist.

[13]  Eduard H. Hovy,et al.  Weakly Supervised User Profile Extraction from Twitter , 2014, ACL.

[14]  John Bryden,et al.  Twitter users change word usage according to conversation-partner social identity , 2015, Soc. Networks.

[15]  Maarten Sap,et al.  The role of personality, age, and gender in tweeting about mental illness , 2015, CLPsych@HLT-NAACL.

[16]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[17]  Benjamin Van Durme Streaming Analysis of Discourse Participants , 2012, EMNLP-CoNLL.

[18]  K. Deaux,et al.  Parameters of social identity , 1995 .

[19]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[20]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[21]  Kira Hall,et al.  Identity and interaction: a sociocultural linguistic approach , 2005, Discourse Studies.

[22]  Namkee Park,et al.  Online environmental community members' intention to participate in environmental activities: An application of the theory of planned behavior in the Chinese context , 2012, Comput. Hum. Behav..

[23]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[24]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[25]  Svitlana Volkova,et al.  On Predicting Sociodemographic Traits and Emotions from Communications in Social Networks and Their Implications to Online Self-Disclosure , 2015, Cyberpsychology Behav. Soc. Netw..

[26]  Michael J. Jensen,et al.  Occupy Wall Street: A New Political Form of Movement and Community? , 2013 .

[27]  P. Burke,et al.  Identity theory and social identity theory , 2000 .

[28]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[29]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[30]  H. Tajfel Human Groups and Social Categories: Studies in Social Psychology , 1981 .

[31]  T. Postmes,et al.  Computer-Mediated Communication as a Channel for Social Resistance , 2002 .

[32]  E. Goffman The Presentation of Self in Everyday Life , 1959 .

[33]  Dong Nguyen,et al.  Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment , 2014, COLING.

[34]  Avelie Stuart,et al.  Whatever happened to Kony2012? Understanding a global Internet phenomenon as an emergent social identity , 2015 .

[35]  Djoerd Hiemstra,et al.  #SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns , 2015, EMNLP.

[36]  Mancur Olson The Logic of Collective Action: Public Goods and the Theory of Groups, Second Printing with a New Preface and Appendix , 2009 .

[37]  S. Stryker Symbolic Interactionism: A Social Structural Version , 1980 .

[38]  F. Reid Rediscovering the Social Group: A Self-Categorization Theory , 1987 .

[39]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[40]  Svitlana Volkova,et al.  Inferring Latent User Properties from Texts Published in Social Media , 2015, AAAI.

[41]  Svitlana Volkova,et al.  Inferring User Political Preferences from Streaming Communications , 2014, ACL.

[42]  S. Kiesler,et al.  Applying Common Identity and Bond Theory to Design of Online Communities , 2007 .

[43]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[44]  T. Postmes,et al.  Toward an integrative social identity model of collective action: a quantitative research synthesis of three socio-psychological perspectives. , 2008, Psychological bulletin.