AUDIENCE-MODULATED VARIATION IN ONLINE SOCIAL MEDIA

Stylistic variation in online social media writing is well attested: for example, geographical analysis of the social media service Twitter has replicated isoglosses for many known lexical variables from speech, while simultaneously revealing a wealth of new geographical lexical variables, including emoticons, phonetic spellings, and phrasal abbreviations. However, less is known about the social role of variation in online writing. This paper examines online writing variation in the context of audience design, focusing on affordances offered by Twitter that allow users to modulate a message's intended audience. We find that the frequency of non-standard lexical variables is inversely related to the size of the intended audience: as writers target smaller audiences, the frequency of lexical variables increases. In addition, these variables are more often used in messages that are addressed to individuals who are known to be geographically local. This phenomenon holds for geographically-differentiated lexical variables, but also for non-standard variables that are widely used throughout the United States. These findings suggest that users of social media are attuned to both the nature of their audience and the social meaning of lexical variation, and that they customize their self-presentation accordingly. Introduction Social media writing is often stylistically distinct from other written genres (Crystal 2006; Eisenstein 2013a), but it also displays an impressive internal stylistic diversity (Herring 2007; Androutsopoulos 2011). Many stylistic variables in social media have been shown to align with macro-level properties of the author, such as geographical location (Eisenstein et al. 2010), age (Schler et al. 2006), race (Eisenstein, Smith, and Xing 2011), and gender (Herring and Paolillo 2006). Linguistic differences are robust enough to support unnervingly accurate predictions of these characteristics based on writing style – with algorithmic predictions in some cases outperforming those of human judgments (Burger et al. 2011). This focus on prediction aligns with Silverstein's (2003) concept of first-order indexicality – the direct association of linguistic variables with macro-level social categories. The huge size of social media corpora makes it easy to identify hundreds of such variables through statistical analysis (e.g., Eisenstein, Smith, and Xing 2011). But social media data has more to offer sociolinguistics than size alone: even though platforms such as Twitter are completely public, they capture language use in natural contexts with real social stakes. These platforms play host to a diverse array of interactional situations, from high school gossip to political debate, and from career networking to intense music fandom. As such, social media data offer new possibilities for understanding the social nature of language: not only who says what, but how stylistic variables are perceived by readers and writers, and how they are used to achieve communicative goals. In this paper, we focus on the relevance of audience to sociolinguistic variation. A rich theoretical literature is already dedicated to this issue, including models of accommodation (Giles, Coupland, and Coupland 1991), audience design (Bell 1984), and stancetaking (Du Bois 2007). Empirical evidence for these models has typically focused on relatively small corpora of conversational speech, with a small number of hand-chosen variables. Indeed, the applicability of audience design and related models to a large-scale corpus of online written communication may appear doubtful – is audience a relevant and quantifiable concept in social media? In public “broadcast” media such as blogs, the properties of the audience seem difficult to identify. Conversely, in directed communication such as e-mails and SMS, the identity of the audience is clear, but acquisition of large amounts of data is impeded by obvious privacy considerations. However, ethnographic research suggests that users of Twitter have definite ideas about who their audience is, and that they customize their self-presentation accordingly (Marwick and boyd 2011). Furthermore, contemporary social media platforms such as Twitter and Facebook offer authors increasingly nuanced capabilities for manipulating the composition of their audience, enabling them to reach both within and beyond the social networks defined by explicitly-stated friendship ties (called “following” in Twitter; Kwak et al. 2010). We define these affordances in detail below. This paper examines these notions of audience in the context of a novel dataset with thousands of writers and more than 200 lexical variables. The variables are organized into two sets: the first consists of terms that distinguish major American metropolitan areas from each other, and is obtained using an automatic technique based on regularized log-odds ratio. The second set of variables consists of the most frequently-used non-standard terms among Twitter users in the United States. In both cases, we find strong evidence of style-shifting according to audience size and proximity. When communication is intended for an individual recipient – particularly a recipient from the same geographical area as the author – both geographically-specific variables and medium-based variables are used at a significantly higher rate. Conversely, when communication is intended to reach a broad audience, outside the individual's social network, both types of variables are inhibited. These findings use a matched dataset design to control for the identify of the author, showing that individual authors are less likely to use non-standard and geographically-specific variables as the intended size of the audience grows. This provides evidence that individuals modulate their linguistic performance as they use social media affordances to control the intended audience of their messages. It also suggests that these non-standard variables – some of which appear to be endogenous to social media and recent in origin – are already viewed as socially marked, and are regulated accordingly.

[1]  W. Labov The social stratification of English in New York City , 1969 .

[2]  P. Trudgill Sex, covert prestige and linguistic change in the urban British English of Norwich , 1972, Language in Society.

[3]  A. Bell Language style as audience design , 1984, Language in Society.

[4]  J. Milroy,et al.  Linguistic change, social network and speaker innovation , 1985, Journal of Linguistics.

[5]  Penelope Eckert,et al.  ay) Goes to the city. Exploring the expressive use of variation , 1996 .

[6]  Carol Simpson,et al.  Internet Relay Chat. , 2000 .

[7]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[8]  M. McPherson,et al.  BIRDS OF A FEATHER: Homophily , 2001 .

[9]  John C. Paolillo Language variation on Internet Relay Chat: A social network approach , 2001 .

[10]  Joseph B. Walther,et al.  The Impacts of Emoticons on Message Interpretation in Computer-Mediated Communication , 2001 .

[11]  Jean Aitchison,et al.  Language and the Internet , 2002, Lit. Linguistic Comput..

[12]  N. Coupland Style and Sociolinguistic Variation: Language, situation, and the relational self: theorizing dialect-style in sociolinguistics , 2002 .

[13]  P. Eckert,et al.  Style and Sociolinguistic Variation. , 2002 .

[14]  M. Silverstein Indexical order and the dialectics of sociolinguistic life , 2003 .

[15]  Evelyn Ziegler,et al.  Exploring Language Variation on the Internet: Regional Speech in a Chat Community , 2003 .

[16]  Crispin Thurlow,et al.  From Statistical Panic to Moral Panic: The Metadiscursive Construction and Popular Exaggeration of New Media Language in the Print Media , 2006, J. Comput. Mediat. Commun..

[17]  John C. Paolillo,et al.  Gender and genre variation in weblogs , 2006 .

[18]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[19]  Sali A. Tagliamonte Analysing Sociolinguistic Variation , 2006 .

[20]  John W. Du Bois The stance triangle , 2007 .

[21]  N. Coupland Style: Language Variation and Identity , 2007 .

[22]  Susan C. Herring,et al.  A Faceted Classification Scheme for Computer-Mediated Discourse , 2007 .

[23]  S. Tagliamonte,et al.  LINGUISTIC RUIN? LOL! INSTANT MESSAGING AND TEEN LANGUAGE , 2008 .

[24]  P. Auer,et al.  Style and Social Identities: Alternative Approaches to Linguistic Heterogeneity , 2008 .

[25]  S. Kiesling,et al.  Indexicality and experience: Exploring the meanings of /aw/-monophthongization in Pittsburgh 1 , 2008 .

[26]  Burt L. Monroe,et al.  Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict , 2008, Political Analysis.

[27]  S. Herring,et al.  Beyond Microblogging: Conversation and Collaboration via Twitter , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[28]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[29]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..

[30]  S. Herring,et al.  Functions of the Nonverbal in CMC: Emoticons and Illocutionary Force , 2010 .

[31]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[32]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[33]  H. Giles,et al.  Contexts of Accommodation: Developments in Applied Sociolinguistics , 2010 .

[34]  Efthimis N. Efthimiadis,et al.  Conversational tagging in twitter , 2010, HT '10.

[35]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[36]  Hsia-Ching Chang,et al.  A new perspective on Twitter hashtag use: Diffusion of innovation theory , 2010, ASIST.

[37]  Susan T. Dumais,et al.  Mark my words!: linguistic style accommodation in social media , 2011, WWW.

[38]  Danah Boyd,et al.  I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience , 2011, New Media Soc..

[39]  Stephen Pax Leonard,et al.  Language change and digital media: A review of conceptions and evidence , 2011 .

[40]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[41]  James Caverlee,et al.  A geographic study of tie strength in social media , 2011, CIKM '11.

[42]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[43]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[44]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[45]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[46]  Hila Becker,et al.  Hip and trendy: Characterizing emerging trends on Twitter , 2011, J. Assoc. Inf. Sci. Technol..

[47]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[48]  Eric Gilbert,et al.  Phrases that signal workplace hierarchy , 2012, CSCW.

[49]  Owen Rambow,et al.  Predicting Overt Display of Power in Written Dialogs , 2012, NAACL.

[50]  David Bamman,et al.  Gender identity and lexical variation in social media , 2012, 1210.4567.

[51]  Jon M. Kleinberg,et al.  Echoes of power: language effects and power differences in social interaction , 2011, WWW.

[52]  Brendan T. O'Connor,et al.  Diffusion of Lexical Change in Social Media , 2012, PloS one.

[53]  Jacob Eisenstein,et al.  Phonological Factors in Social Media Writing , 2013 .

[54]  Ian R. Johnson 6. Audience Design And Communication Accommodation Theory: Use Of Twitter By Welsh–English Biliterates , 2013 .

[55]  Jure Leskovec,et al.  A computational approach to politeness with application to social factors , 2013, ACL.

[56]  Michael S. Bernstein,et al.  Quantifying the invisible audience in social networks , 2013, CHI.

[57]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[58]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[59]  David Jurgens,et al.  That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships , 2013, ICWSM.

[60]  Jannis Androutsopoulos Languaging when contexts collapse: Audience design in social networking , 2014 .

[61]  Eric P. Xing,et al.  Diffusion of Lexical Change in Social Media , 2012, PloS one.

[62]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[63]  Dong Nguyen,et al.  Audience and the Use of Minority Languages on Twitter , 2015, ICWSM.