Predicting the Political Alignment of Twitter Users

The widespread adoption of social media for political communication creates unprecedented opportunities to monitor the opinions of large numbers of politically active individuals in real time. However, without a way to distinguish between users of opposing political alignments, conflicting signals at the individual level may, in the aggregate, obscure partisan differences in opinion that are important to political strategy. In this article we describe several methods for predicting the political alignment of Twitter users based on the content and structure of their political communication in the run-up to the 2010 U.S. midterm elections. Using a data set of 1,000 manually-annotated individuals, we find that a support vector machine (SVM) trained on hash tag metadata outperforms an SVM trained on the full text of users' tweets, yielding predictions of political affiliations with 91% accuracy. Applying latent semantic analysis to the content of users' tweets we identify hidden structure in the data strongly associated with political affiliation, but do not find that topic detection improves prediction performance. All of these content-based methods are outperformed by a classifier based on the segregated community structure of political information diffusion networks (95% accuracy). We conclude with a practical application of this machinery to web-based political advertising, and outline several approaches to public opinion monitoring based on the techniques developed herein.

[1]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[2]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[3]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[4]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[5]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[6]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[7]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[8]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[9]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[10]  Chris Moore,et al.  Sharing music files: Tactics of a challenge to the industry , 2010, First Monday.

[11]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[12]  D. Boyd,et al.  Dynamic Debates: An Analysis of Group Polarization Over Time on Twitter , 2010 .

[13]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[14]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[15]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[16]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[17]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[18]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[19]  Andrew K. C. Wong,et al.  Testing extensive use of NER tools in article classification and a statistical approach for method interaction extraction in the protein-protein interaction literature , 2010 .

[20]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[21]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[22]  Lyle Ungar,et al.  Discovery of significant emerging trends , 2010, KDD.

[23]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[24]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[25]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[27]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[28]  Hagit Shatkay,et al.  Mining the Biomedical Literature in the Genomic Era: An Overview , 2003, J. Comput. Biol..

[29]  Michael D. Smith,et al.  Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection , 2006, WEBKDD.

[30]  Susan C. Herring,et al.  Beyond Microblogging: Conversation and Collaboration via Twitter , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[31]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[32]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[33]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[35]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[36]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[37]  Jacob Ratkiewicz,et al.  Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams , 2010, ArXiv.

[38]  K. T. Poole,et al.  Congress: A Political-Economic History of Roll Call Voting , 1997 .

[39]  Miles Efron Using cocitation information to estimate political orientation in web documents , 2004, CIKM '04.

[40]  L. Hubert,et al.  Comparing partitions , 1985 .