Linguistic Diversities of Demographic Groups in Twitter

The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algorithms from Face++. Then, we investigate how demographic groups (i.e. male/female, Asian/Black/White) differ in terms of linguistic styles and also their interests. We extract linguistic features from 6 categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus), in order to identify the similarities and differences in particular writing set of attributes. In addition, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis unveils clear differences in the writing styles (and the topics of interest) of different demographic groups, with variation seen across both gender and race lines. We hope our effort can stimulate the development of new studies related to demographic information in the online space.

[1]  Yong-Yeol Ahn,et al.  Twitter's Glass Ceiling: The Effect of Perceived Gender on Online Visibility , 2016, ICWSM.

[2]  Virgílio A. F. Almeida,et al.  A gender based study of tagging behavior in twitter , 2012, HT '12.

[3]  Fabrício Benevenuto,et al.  You followed my bot! Transforming robots into influential users in Twitter , 2013, First Monday.

[4]  M. Williams,et al.  Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data , 2015, PloS one.

[5]  Krishna P. Gummadi,et al.  The World of Connections and Information Flow in Twitter , 2012, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[6]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[7]  Fabrício Benevenuto,et al.  A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods , 2015, ArXiv.

[8]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[9]  Aron Culotta,et al.  Predicting the Demographics of Twitter Users from Website Traffic Data , 2015, AAAI.

[10]  Markus Strohmaier,et al.  Inferring Gender from Names on the Web: A Comparative Evaluation of Gender Detection Methods , 2016, WWW.

[11]  Krishna P. Gummadi,et al.  Geographic Dissection of the Twitter Network , 2012, ICWSM.

[12]  Jisun An,et al.  #greysanatomy vs. #yankees: Demographics and Hashtag Use on Twitter , 2016, ICWSM.

[13]  Venkata Rama Kiran Garimella,et al.  Inferring international and internal migration patterns from Twitter data , 2014, WWW.

[14]  Munmun De Choudhury,et al.  Gender and Cross-Cultural Differences in Social Media Disclosures of Mental Illness , 2017, CSCW.

[15]  Bálint Daróczy,et al.  Why Do Men Get More Attention? Exploring Factors Behind Success in An Online Design Community , 2017, ICWSM.

[16]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[17]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[18]  Yuning Jiang,et al.  Learning Deep Face Representation , 2014, ArXiv.

[19]  Fusheng Wang,et al.  A Comparative Study of Demographic Attribute Inference in Twitter , 2015, ICWSM.

[20]  David García,et al.  Bias in Online Freelance Marketplaces: Evidence from TaskRabbit and Fiverr , 2017, CSCW.

[21]  Saeideh Bakhshi,et al.  "I need to try this"?: a statistical overview of pinterest , 2013, CHI.

[22]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[23]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[24]  Fabrício Benevenuto,et al.  Reverse engineering socialbot infiltration strategies in Twitter , 2014, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[25]  Júlio Cesar dos Reis,et al.  Demographics of News Sharing in the U.S. Twittersphere , 2017, HT.

[26]  Cameron Blevins,et al.  Jane, John ... Leslie? A Historical Method for Algorithmic Gender Prediction , 2015, Digit. Humanit. Q..

[27]  Krishna P. Gummadi,et al.  Inferring user interests in the Twitter social network , 2014, RecSys '14.

[28]  Krishna P. Gummadi,et al.  Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations , 2017, ICWSM.

[29]  Krishna P. Gummadi,et al.  The Many Shades of Anonymity: Characterizing Anonymous Social Media Content , 2021, ICWSM.