Gender identity and lexical variation in social media

We present a study of the relationship between gender, linguistic style, and social networks, using a novel corpus of 14,000 Twitter users. Prior quantitative work on gender often treats this social variable as a female/male binary; we argue for a more nuanced approach. By clustering Twitter users, we find a natural decomposition of the dataset into various styles and topical interests. Many clusters have strong gender orientations, but their use of linguistic resources sometimes directly conflicts with the population-level language statistics. We view these clusters as a more accurate reflection of the multifaceted nature of gendered language styles. Previous corpus-based work has also had little to say about individuals whose linguistic styles defy population-level gender patterns. To identify such individuals, we train a statistical classifier, and measure the classifier confidence for each individual in the dataset. Examining individuals whose language does not match the classifier's model for their gender, we find that they have social networks that include significantly fewer same-gender social connections and that, in general, social network homophily is correlated with the use of same-gender language markers. Pairing computational methods and social theory thus offers a new perspective on how gender emerges as individuals position themselves relative to audiences, topics, and mainstream gender norms.

[1]  A. Kellerman,et al.  The Constitution of Society : Outline of the Theory of Structuration , 2015 .

[2]  John W. Du Bois The stance triangle , 2007 .

[3]  Lesley Milroy,et al.  Language and social networks , 1980 .

[4]  Shlomo Argamon,et al.  Mining the Blogosphere: Age, gender and the varieties of self-expression , 2007, First Monday.

[5]  G. Leech,et al.  Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus , 1997 .

[6]  Variation and the indexical field , 1970 .

[7]  Eric K. Acton On Gender Differences in the Distribution of um and uh , 2011 .

[8]  J. Gumperz Linguistic and Social Interaction in Two Communities1 , 1964 .

[9]  Lisa J. Green African American English: Contents , 2002 .

[10]  J. Chambers,et al.  Sociolinguistic theory : linguistic variation and its socialsignificance , 1995 .

[11]  W. Labov The intersection of sex and social class in the course of linguistic change , 1990, Language Variation and Change.

[12]  Kira Hall,et al.  Lip Service on the Fantasy Lines , 2009 .

[13]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[14]  L. McCall The Complexity of Intersectionality , 2005, Signs: Journal of Women in Culture and Society.

[15]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[16]  P. Trudgill The Social Differentiation of English in Norwich , 1974 .

[17]  Jennifer Coates,et al.  Women in their speech communities : new perspectives on language and sex , 1988 .

[18]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[19]  Arjun Mukherjee,et al.  Improving Gender Classification of Blog Authors , 2010, EMNLP.

[20]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[21]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[22]  Kristine L. Fitch,et al.  The Urbanization of Rural Dialect Speakers: A Sociolinguistic Study in Brazil , 1985 .

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[25]  Stable Url A Theory of Structure: Duality, Agency, and Transformation , 2007 .

[26]  P. Eckert Variation and the indexical field 1 , 2008 .

[27]  A. Bell Language style as audience design , 1984, Language in Society.

[28]  L. Meân,et al.  Identity and Discursive Practice: Doing Gender on the Football Pitch , 2001 .

[29]  Mary Bucholtz,et al.  Reinventing Identities: The Gendered Self in Discourse , 1999 .

[30]  M. Wood Language: Contexts and Consequences. , 1993 .

[31]  Deborah Schiffrin,et al.  Narrative as self-portrait: Sociolinguistic constructions of identity , 1996, Language in Society.

[32]  J. Hunter African American English: A Linguistic Introduction , 2002 .

[33]  D. Biber A typology of English texts , 1989 .

[34]  J. Coates Women Talk: Conversation Between Women Friends , 1991 .

[35]  M. Goodwin He-Said-She-Said: Talk As Social Organization Among Black Children , 1993 .

[36]  Jean-Marc Dewaele,et al.  Variation in the Contextuality of Language: An Empirical Measure , 2002 .

[37]  Jenny Cheshire Sex and Gender in Variationist Research , 2008 .

[38]  Ann Phoenix,et al.  Ain't I A Woman? Revisiting Intersectionality , 2004 .

[39]  Pierre Bourdieu,et al.  Outline of a Theory of Practice , 2020, On Violence.

[40]  J. Holmes Women, Language and Identity , 1997 .

[41]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[42]  J. Butler Gender Trouble: Feminism and the Subversion of Identity , 1990 .

[43]  J. Milroy,et al.  Glottal stops and Tyneside glottalization: Competing patterns of variation and change in British English , 1994, Language Variation and Change.

[44]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[45]  Zhiying Xin,et al.  Stancetaking in Discourse: Subjectivity, Evaluation, Interaction , 2008 .

[46]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[47]  K. Crenshaw Mapping the margins: intersectionality, identity politics, and violence against women of color , 1991 .

[48]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[49]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[50]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[51]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[52]  Sally McConnell-Ginet,et al.  Gender, Sexuality, and Meaning: Linguistic Practice and Politics , 2011 .

[53]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[54]  Geneva Smitherman,et al.  Spoken Soul: The Story of Black English , 2000 .

[55]  Penelope Eckert,et al.  New generalizations and explanations in language and gender research , 1999, Language in Society.

[56]  Suzanne Romaine,et al.  Variation in Language and Gender , 2008 .

[57]  S. Gal,et al.  Language shift: Social determinants of linguistic change in bilingual Austria , 1979 .

[58]  M. Thelwall Social networks, gender, and friending: An analysis of MySpace member profiles , 2008 .

[59]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[60]  Lisa J. Green,et al.  African American English: African American English , 2002 .

[61]  Penelope Eckert,et al.  Constructing Meaning, Constructing Selves: Snapshots of Language, Gender, and Class from Belten High , 2012 .

[62]  W. Labov The social stratification of English in New York City , 1969 .

[63]  Jane Sunderland Gender, sexuality and meaning: linguistic practice and politics , 2013 .

[64]  M. Thelwall Homophily in MySpace , 2009, J. Assoc. Inf. Sci. Technol..

[65]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[66]  Jon Oberlander,et al.  Weblogs, genres and individual differences , 2005 .

[67]  John C. Paolillo,et al.  Gender and genre variation in weblogs , 2006 .

[68]  D. Tannen Spoken and written language : exploring orality and literacy , 1984 .

[69]  Jack Chambers Linguistic Correlates of Gender and Sex , 1992 .

[70]  A. Brenner Twitter Use 2012 , 2012 .

[71]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[72]  Anne Fausto-Sterling,et al.  Myths of Gender: Biological Theories about Women and Men , 1987 .

[73]  D. Tannen Oral and Literate Strategies in Spoken and Written Narratives. , 1982 .

[74]  Sali A. Tagliamonte Analysing Sociolinguistic Variation , 2006 .

[75]  P. Eckert,et al.  Language and Gender: Introduction to the study of language and gender , 2013 .