Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump

Measuring and forecasting opinion trends from real-time social media is a long-standing goal of big-data analytics. Despite the large amount of work addressing this question, there has been no clear validation of online social media opinion trend with traditional surveys. Here we develop a method to infer the opinion of Twitter users by using a combination of statistical physics of complex networks and machine learning based on hashtags co-occurrence to build an in-domain training set of the order of a million tweets. We validate our method in the context of 2016 US Presidential Election by comparing the Twitter opinion trend with the New York Times National Polling Average, representing an aggregate of hundreds of independent traditional polls. The Twitter opinion trend follows the aggregated NYT polls with remarkable accuracy. We investigate the dynamics of the social network formed by the interactions among millions of Twitter supporters and infer the support of each user to the presidential candidates. Our analytics unleash the power of Twitter to uncover social trends from elections, brands to political movements, and at a fraction of the cost of traditional surveys.

[1]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[2]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[3]  Andreas Jungherr,et al.  Digital Trace Data in the Study of Public Opinion , 2017 .

[4]  Tanveer A. Faruquie,et al.  Understanding election candidate approval ratings using social media data , 2013, WWW '13 Companion.

[5]  Yurii B. Shvetsov,et al.  Common Genetic Variation In Cellular Transport Genes and Epithelial Ovarian Cancer (EOC) Risk , 2015, PloS one.

[6]  Diego Reforgiato Recupero,et al.  AVA: Adjective-Verb-Adverb Combinations for Sentiment Analysis , 2008, IEEE Intelligent Systems.

[7]  Juan Martínez-Romo,et al.  Disentangling categorical relationships through a graph of co-occurrences. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Lei Shi,et al.  Predicting US Primary Elections with Twitter , 2012 .

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Clare Llewellyn,et al.  Brexit? Analyzing Opinion on the UK-EU Referendum within Twitter , 2016, ICWSM.

[11]  Walid Magdy,et al.  Content and Network Dynamics Behind Egyptian Political Polarization on Twitter , 2014, CSCW.

[12]  Matthew Purver,et al.  Twitter Language Use Reflects Psychological Differences between Democrats and Republicans , 2015, PloS one.

[13]  A. J. Morales,et al.  Characterizing and modeling an electoral campaign in the context of Twitter: 2011 Spanish Presidential Election as a case study , 2012, Chaos.

[14]  G. Vojta Fractals and Disordered Systems , 1997 .

[15]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[16]  Guido Caldarelli,et al.  S 1 Appendix , 2016 .

[17]  Joshua A. Tucker,et al.  Is Online Political Communication More Than an Echo Chamber? , 2022 .

[18]  S. Iacus,et al.  Using Sentiment Analysis to Monitor Electoral Campaigns , 2015 .

[19]  Nicholas A. Thapen,et al.  Towards Passive Political Opinion Polling using Twitter , 2013, SMA@BCS-SGAI.

[20]  Yiannis Kompatsiaris,et al.  Predicting Elections for Multiple Countries Using Twitter and Polls , 2015, IEEE Intelligent Systems.

[21]  S. Iacus,et al.  Politics and Big Data : Nowcasting and Forecasting Elections with Social Media , 2016 .

[22]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[23]  Zhiming Zheng,et al.  Searching for superspreaders of information in real-world social media , 2014, Scientific Reports.

[24]  L. Bécu,et al.  Evidence for three-dimensional unstable flows in shear-banding wormlike micelles. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Kenneth Benoit,et al.  Predicting the Brexit Vote by Tracking and Classifying Public Opinion Using Twitter Data , 2017 .

[26]  Isabell M. Welpe,et al.  Election Forecasts With Twitter , 2011 .

[27]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[28]  Georgios Paltoglou,et al.  Signals of Public Opinion in Online Communication , 2015 .

[29]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[30]  John Bohannon,et al.  The pulse of the people. , 2017, Science.

[31]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[32]  Guido Caldarelli,et al.  Opinion dynamics on interacting networks: media competition and social influence , 2014, Scientific Reports.

[33]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[34]  Sune Lehmann,et al.  Tweetin' in the Rain: Exploring Societal-Scale Effects of Weather on Mood , 2012, ICWSM.

[35]  Junehwa Song,et al.  Agenda Diversity in Social Media Discourse: A Study of the 2012 Korean General Election , 2013, ICWSM.

[36]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[37]  Kathleen M. Carley,et al.  Two 1%s Don't Make a Whole: Comparing Simultaneous Samples from Twitter's Streaming API , 2014, SBP.

[38]  JungherrAndreas,et al.  Why the Pirate Party Won the German Election of 2009 or The Trouble With Predictions , 2012 .

[39]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  H. Eugene Stanley,et al.  COUPLED NETWORK APPROACH TO PREDICTABILITY OF FINANCIAL MARKET RETURNS AND NEWS SENTIMENTS , 2015 .

[41]  Barbara R. Jasny,et al.  Prediction and its limits. , 2017, Science.

[42]  Guido Caldarelli,et al.  A Multi-Level Geographical Study of Italian Political Elections from Twitter Data , 2014, PloS one.

[43]  Filippo Menczer,et al.  Partisan asymmetries in online political activity , 2012, EPJ Data Science.

[44]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  Paolo Natale Andrea Ceron, Luigi Curini e Stefano M. Iacus. Politics and Big Data. Nowcasting and Forecasting Elections with Social Media , 2017 .

[46]  Stefano Maria Iacus,et al.  iSA: A fast, scalable and accurate algorithm for sentiment analysis of social media content , 2016, Inf. Sci..

[47]  Ee-Peng Lim,et al.  Politics, sharing and emotion in microblogs , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[48]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[49]  Luis Alfonso Ureña López,et al.  Ranked WordNet graph for Sentiment Polarity Classification in Twitter , 2014, Comput. Speech Lang..

[50]  Terence C. Mills,et al.  Time series techniques for economists , 1990 .

[51]  Clayton Fink,et al.  Twitter, Public Opinion, and the 2011 Nigerian Presidential Election , 2013, 2013 International Conference on Social Computing.

[52]  V. S. Subrahmanian,et al.  Using Twitter Sentiment to Forecast the 2013 Pakistani Election and the 2014 Indian Election , 2015, IEEE Intelligent Systems.

[53]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[54]  Daniel Gayo-Avello,et al.  A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data , 2012, ArXiv.

[55]  Béla Bollobás,et al.  Random Graphs: Notation , 2001 .

[56]  V. Poghosyan,et al.  Numerical study of the correspondence between the dissipative and fixed-energy Abelian sandpile models. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[57]  Nathanael Chambers,et al.  Learning for Microblogs with Distant Supervision: Political Forecasting with Twitter , 2012, EACL.

[58]  Jiebo Luo,et al.  Deciphering the 2016 U.S. Presidential Campaign in the Twitter Sphere: A Comparison of the Trumpists and Clintonists , 2016, ICWSM.

[59]  Tomaso Aste,et al.  When Can Social Media Lead Financial Markets? , 2014, Scientific Reports.

[60]  Nicholas Beauchamp,et al.  Predicting and Interpolating State‐Level Polls Using Twitter Textual Data , 2017 .

[61]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.