Non-lexical Features Encode Political Affiliation on Twitter

Previous work on classifying Twitter users’ political alignment has mainly focused on lexical and social network features. This study provides evidence that political affiliation is also reflected in features which have been previously overlooked: users’ discourse patterns (proportion of Tweets that are retweets or replies) and their rate of use of capitalization and punctuation. We find robust differences between politically left- and right-leaning communities with respect to these discourse and sub-lexical features, although they are not enough to train a high-accuracy classifier.

[1]  W. Labov Principles of Linguistic Change: Cognitive and Cultural Factors , 2010 .

[2]  E. Coppock,et al.  INDEXING POLITICAL PERSUASION: VARIATION IN THE IRAQ VOWELS , 2010 .

[3]  Matthew Purver,et al.  Twitter Language Use Reflects Psychological Differences between Democrats and Republicans , 2015, PloS one.

[4]  Jack Grieve,et al.  Regional Variation in Written American English , 2016 .

[5]  David Bamman,et al.  Gender in Twitter: Styles, stances, and social networks , 2012, ArXiv.

[6]  Carole E. Chaski,et al.  Who's At The Keyboard? Authorship Attribution in Digital Evidence Investigations , 2005, Int. J. Digit. EVid..

[7]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[8]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[9]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[10]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[11]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[13]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[14]  Mung Chiang,et al.  Quantifying Political Leaning from Tweets, Retweets, and Retweeters , 2016, IEEE Transactions on Knowledge and Data Engineering.

[15]  D. Nguyen Text as social and cultural data : a computational perspective on variation in text , 2017 .

[16]  Sharon Goldwater,et al.  Aye or naw, whit dae ye hink? Scottish independence and linguistic identity on social media , 2017, EACL.

[17]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[18]  Jennifer Golbeck,et al.  Predicting Personality from Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[19]  Hsinchun Chen,et al.  Applying authorship analysis to extremist-group Web forum messages , 2005, IEEE Intelligent Systems.

[20]  Rich Ling,et al.  The Sociolinguistics of SMS: An Analysis of SMS Use by a Random Sample of Norwegians , 2005 .