Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Automatic political orientation prediction from social media posts has to date proven successful only in distinguishing between publicly declared liberals and conservatives in the US. This study examines users’ political ideology using a seven-point scale which enables us to identify politically moderate and neutral users – groups which are of particular interest to political scientists and pollsters. Using a novel data set with political ideology labels self-reported through surveys, our goal is two-fold: a) to characterize the groups of politically engaged users through language use on Twitter; b) to build a fine-grained model that predicts political ideology of unseen users. Our results identify differences in both political leaning and engagement and the extent to which each group tweets using political keywords. Finally, we demonstrate how to improve ideology prediction accuracy by exploiting the relationships between the user groups.

[1]  Burt L. Monroe,et al.  Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict , 2008, Political Analysis.

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Yoram Bachrach,et al.  Studying User Income through Language, Behaviour and Affect in Social Media , 2015, PloS one.

[4]  Andreas Jungherr,et al.  Hacking the electorate: How campaigns perceive voters , 2017 .

[5]  Lyle H. Ungar,et al.  Exploring Stylistic Variation with Age and Income on Twitter , 2016, ACL.

[6]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[7]  P. Converse The Nature of Belief Systems in Mass Publics , 2004 .

[8]  Загоровская Ольга Владимировна,et al.  Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count , 2015 .

[9]  Krishna P. Gummadi,et al.  Message Impartiality in Social Media Discussions , 2016, ICWSM.

[10]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[11]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[12]  Venkata Rama Kiran Garimella,et al.  Political Hashtag Trends , 2013, ECIR.

[13]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[14]  John T. Jost,et al.  Why Are Conservatives Happier Than Liberals? , 2008, Psychological science.

[15]  Erin M. Schumaker,et al.  The Effects of Verbal Versus Photographic Self‐Presentation on Impression Formation in Facebook , 2012 .

[16]  Margaret L. Kern,et al.  Real Men Don’t Say “Cute” , 2016, Social Psychological and Personality Science.

[17]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[18]  Morris P. Fiorina Extreme Voices: A Dark Side of Civic Engagement , 2008 .

[19]  A. Abramowitz,et al.  The Disappearing Center: Engaged Citizens, Polarization, and American Democracy , 2010 .

[20]  Lyle H. Ungar,et al.  Analyzing Personality through Social Media Profile Picture Choice , 2016, ICWSM.

[21]  Joseph Bafumi,et al.  Leapfrog Representation and Extremism: A Study of American Voters and Their Members in Congress , 2010, American Political Science Review.

[22]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[23]  Philip Resnik,et al.  Political Ideology Detection Using Recursive Neural Networks , 2014, ACL.

[24]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[25]  Stephen Ansolabehere,et al.  The Strength of Issues: Using Multiple Measures to Gauge Preference Stability, Ideological Constraint, and Issue Voting , 2008, American Political Science Review.

[26]  Cindy D. Kam,et al.  Beyond the “Narrow Data Base”: Another Convenience Sample for Experimental Research , 2007 .

[27]  Svitlana Volkova,et al.  Inferring User Political Preferences from Streaming Communications , 2014, ACL.

[28]  Trevor Cohn,et al.  Predicting and Characterising User Impact on Twitter , 2014, EACL.

[29]  Subramanian Ramanathan,et al.  On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions , 2013, ICMI '13.

[30]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[31]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[32]  Djoerd Hiemstra,et al.  #WhoAmI in 160 Characters? Classifying Social Identities Based on Twitter Profile Descriptions , 2016, NLP+CSS@EMNLP.

[33]  Lyle H. Ungar,et al.  Studying the Dark Triad of Personality through Twitter Behavior , 2016, CIKM.

[34]  Christopher Ellis,et al.  Ideology in America , 2012 .

[35]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[37]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[38]  Andrew Gelman,et al.  Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do , 2008 .

[39]  Lyle H. Ungar,et al.  Analyzing Biases in Human Perception of User Age and Gender from Text , 2016, ACL.

[40]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[41]  P. Ekman An argument for basic emotions , 1992 .

[42]  R. M. Alvarez Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data , 2014 .

[43]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[44]  Jeffrey A. Hall,et al.  Impression management and formation on Facebook: A lens model approach , 2014, New Media Soc..

[45]  Kalina Bontcheva,et al.  Where's @wally?: a classification approach to geolocating users based on their social ties , 2013, HT '13.

[46]  Luming Zhang,et al.  Action2Activity: Recognizing Complex Activities from Sensor Data , 2015, IJCAI.

[47]  Firoj Alam,et al.  Predicting Personality Traits using Multimodal Information , 2014, WCPR '14.

[48]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[49]  David E. Broockman Approaches to Studying Policy Representation , 2016 .

[50]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[51]  Maarten Sap,et al.  Developing Age and Gender Predictive Lexica over Social Media , 2014, EMNLP.

[52]  Gregory A. Huber,et al.  Personality and Political Attitudes: Relationships across Issue Domains and Political Contexts , 2010, American Political Science Review.

[53]  S. Dollinger,et al.  Creativity and conservatism , 2007 .

[54]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  George Lakoff,et al.  Moral politics : what conservatives know that liberals don't , 1998 .

[56]  Pengfei Wang,et al.  Your Cart tells You: Inferring Demographic Attributes from Purchase Data , 2016, WSDM.

[57]  Luming Zhang,et al.  Fortune Teller: Predicting Your Career Path , 2016, AAAI.

[58]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.

[59]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[60]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[61]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[62]  Matthew Purver,et al.  Twitter Language Use Reflects Psychological Differences between Democrats and Republicans , 2015, PloS one.

[63]  David H. Reiley,et al.  Online ads and offline sales: measuring the effect of retail advertising via a controlled experiment on Yahoo! , 2014 .

[64]  Noah A. Smith,et al.  Shedding (a Thousand Points of) Light on Biased Language , 2010, Mturk@HLT-NAACL.

[65]  Nikolaos Aletras,et al.  An analysis of the user occupational class through Twitter content , 2015, ACL.

[66]  Yu Zheng,et al.  Urban Water Quality Prediction Based on Multi-Task Multi-View Learning , 2016, IJCAI.

[67]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[68]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[69]  Steven Skiena,et al.  Exact Age Prediction in Social Networks , 2015, WWW.

[70]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[71]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.