Predicting and Interpolating State‐Level Polls Using Twitter Textual Data

Spatially or temporally dense polling remains both difficult and expensive using existing survey methods. In response, there have been increasing efforts to approximate various survey measures using social media, but most of these approaches remain methodologically flawed. To remedy these flaws, this article combines 1,200 state-level polls during the 2012 presidential campaign with over 100 million state-located political tweets; models the polls as a function of the Twitter text using a new linear regularization feature-selection method; and shows via out-of-sample testing that when properly modeled, the Twitter-based measures track and to some degree predict opinion polls, and can be extended to unpolled states and potentially substate regions and subday timescales. An examination of the most predictive textual features reveals the topics and events associated with opinion shifts, sheds light on more general theories of partisan difference in attention and information processing, and may be of use for real-time campaign strategy.

[1]  Giuseppe Porro,et al.  Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France , 2013, New Media Soc..

[2]  A. Gelman,et al.  Deep Interactions with MRP: Election Turnout and Voting Patterns Among Small Electoral Subgroups , 2013 .

[3]  Lada A. Adamic,et al.  The Party Is Over Here: Structure and Content in the 2010 Election , 2011, ICWSM.

[4]  Mark Edward Huberty,et al.  Multi-cycle forecasting of congressional elections with social media , 2013, PLEAD '13.

[5]  Amit Srivastava,et al.  Leveraging candidate popularity on Twitter to predict election outcome , 2013, SNAKDD '13.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Johan Bos,et al.  Predicting the 2011 Dutch Senate Election Results with Twitter , 2012 .

[8]  H. Jansen,et al.  Pundits, Ideologues, and the Ranters: The British Columbia Election Online , 2006 .

[9]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[10]  Andrew Gelman,et al.  Bayesian Multilevel Estimation with Poststratification: State-Level Estimates from National Polls , 2004, Political Analysis.

[11]  Mark A. Bedau,et al.  Twitter Keyword Volume, Current Spending, and Weekday Spending Norms Predict Consumer Spending , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[12]  Eni Mustafaraj,et al.  On the predictability of the U.S. elections through search volume activity , 2011 .

[13]  Mika Gustafsson,et al.  Gene Expression Prediction by Soft Integration and the Elastic Net—Best Performance of the DREAM3 Gene Expression Challenge , 2010, PloS one.

[14]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[15]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[16]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[17]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[18]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[19]  William Stafford Noble,et al.  Support vector machine , 2013 .

[20]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[21]  Shan Suthaharan,et al.  Support Vector Machine , 2016 .

[22]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  Mung Chiang,et al.  Quantifying Political Leaning from Tweets and Retweets , 2013, ICWSM.

[25]  JungherrAndreas,et al.  Why the Pirate Party Won the German Election of 2009 or The Trouble With Predictions , 2012 .

[26]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[27]  Ee-Peng Lim,et al.  Tweets and Votes: A Study of the 2011 Singapore General Election , 2012, 2012 45th Hawaii International Conference on System Sciences.

[28]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[29]  R. Huckfeldt,et al.  Citizens, Politics and Social Communication: Information and Influence in an Election Campaign , 1995 .

[30]  Nick Beauchamp A Bottom-Up Approach to Linguistic Persuasion in Advertising: , 2012 .

[31]  Ee-Peng Lim,et al.  Politics, sharing and emotion in microblogs , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[32]  Filippo Menczer,et al.  Partisan asymmetries in online political activity , 2012, EPJ Data Science.

[33]  Eni Mustafaraj,et al.  Can Collective Sentiment Expressed on Twitter Predict Political Elections? , 2011, AAAI.

[34]  Jeffrey R. Lax,et al.  How Should We Estimate Public Opinion in the States , 2009 .

[35]  Daniel Gayo-Avello,et al.  A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data , 2012, ArXiv.

[36]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[37]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[38]  Ramanathan V. Guha,et al.  The predictive power of online chatter , 2005, KDD '05.

[39]  Panagiotis Takis Metaxas,et al.  How (Not) to Predict Elections , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[40]  P. Whiteley Is the party over? The decline of party activism and membership across the democratic world , 2011 .

[41]  Panagiotis Takis Metaxas,et al.  Limits of Electoral Predictions Using Twitter , 2011, ICWSM.

[42]  S. Albrecht,et al.  Weblog Campaigning in the German Bundestag Election 2005 , 2007 .

[43]  K. Kenski,et al.  Connections Between Internet Use and Political Efficacy, Knowledge, and Participation , 2006 .