Online and Social Media Data As an Imperfect Continuous Panel Survey

There is a large body of research on utilizing online activity as a survey of political opinion to predict real world election outcomes. There is considerably less work, however, on using this data to understand topic-specific interest and opinion amongst the general population and specific demographic subgroups, as currently measured by relatively expensive surveys. Here we investigate this possibility by studying a full census of all Twitter activity during the 2012 election cycle along with the comprehensive search history of a large panel of Internet users during the same period, highlighting the challenges in interpreting online and social media activity as the results of a survey. As noted in existing work, the online population is a non-representative sample of the offline world (e.g., the U.S. voting population). We extend this work to show how demographic skew and user participation is non-stationary and difficult to predict over time. In addition, the nature of user contributions varies substantially around important events. Furthermore, we note subtle problems in mapping what people are sharing or consuming online to specific sentiment or opinion measures around a particular topic. We provide a framework, built around considering this data as an imperfect continuous panel survey, for addressing these issues so that meaningful insight about public interest and opinion can be reliably extracted from online and social media data.

[1]  Kam-Fai Wong,et al.  Quantising Opinions for Political Tweets Analysis , 2012, LREC.

[2]  Fernando Cuartero,et al.  Twitter as a Tool for Predicting Elections Results , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[3]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[4]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[5]  Alessandro Rozza,et al.  Modelling political disaffection from Twitter data , 2013, WISDOM '13.

[6]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[7]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[8]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[9]  Jisue Lee,et al.  Citizens' Use of Twitter in Political Information Sharing in South Korea , 2013 .

[10]  Amit P. Sheth,et al.  Are Twitter Users Equal in Predicting Elections? A Study of User Groups in Predicting 2012 U.S. Republican Presidential Primaries , 2012, SocInfo.

[11]  Eni Mustafaraj,et al.  On the predictability of the U.S. elections through search volume activity , 2011 .

[12]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[13]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[14]  Duncan J. Watts,et al.  Who says what to whom on twitter , 2011, WWW.

[15]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[16]  Sharad Goel,et al.  Who Does What on the Web: A Large-Scale Study of Browsing Behavior , 2012, ICWSM.

[17]  David M. Rothschild,et al.  Forecasting elections with non-representative polls , 2015 .

[18]  Emre Kiciman,et al.  OMG, I Have to Tweet that! A Study of Factors that Influence Tweet Rates , 2012, ICWSM.

[19]  Panagiotis Takis Metaxas,et al.  Vocal Minority Versus Silent Majority: Discovering the Opionions of the Long Tail , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[20]  Munmun De Choudhury,et al.  Not All Moods Are Created Equal! Exploring Human Emotional States in Social Media , 2012, ICWSM.

[21]  Nicholas Beauchamp,et al.  Predicting and Interpolating State‐Level Polls Using Twitter Textual Data , 2017 .

[22]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[23]  Munmun De Choudhury,et al.  Happy, Nervous or Surprised? Classification of Human Affective States in Social Media , 2012, ICWSM.

[24]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[25]  Ee-Peng Lim,et al.  Tweets and Votes: A Study of the 2011 Singapore General Election , 2012, 2012 45th Hawaii International Conference on System Sciences.

[26]  Jing Jiang,et al.  An Empirical Comparison of Topics in Twitter and Traditional Media , 2011 .

[27]  Keith W. Ross,et al.  What's in a Name: A Study of Names, Gender Inference, and Gender Behavior in Facebook , 2011, DASFAA Workshops.

[28]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[29]  Cameron Marlow,et al.  A 61-million-person experiment in social influence and political mobilization , 2012, Nature.

[30]  Elad Yom-Tov,et al.  The Effect of Social and Physical Detachment on Information Need , 2013, ACM Trans. Inf. Syst..

[31]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[32]  Jacob Ratkiewicz,et al.  Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams , 2010, ArXiv.

[33]  Shilpa Shukla,et al.  On Classifying the Political Sentiment of Tweets , 2011 .

[34]  J. Bollen,et al.  More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior , 2013, PloS one.

[35]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[36]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[37]  Narseo Vallina-Rodriguez,et al.  Los Twindignados: The Rise of the Indignados Movement on Twitter , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[38]  Mark Edward Huberty,et al.  Multi-cycle forecasting of congressional elections with social media , 2013, PLEAD '13.

[39]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[40]  Lei Shi,et al.  Predicting US Primary Elections with Twitter , 2012 .

[41]  Omar Alonso,et al.  Exploiting entities in social media , 2013, ESAIR '13.

[42]  Robert M. Groves,et al.  Total Survey Error: Past, Present, and Future , 2010 .

[43]  Daniel Gayo-Avello,et al.  "I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper" - A Balanced Survey on Election Prediction using Twitter Data , 2012, ArXiv.

[44]  D. Watts,et al.  Dissecting the Spirit of Gezi: Influence vs. Selection in the Occupy Gezi Movement. , 2015 .

[45]  P. Metaxas,et al.  Social Media and the Elections , 2012, Science.

[46]  D. Yeager,et al.  Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-Probability Samples , 2011 .

[47]  Panagiotis Takis Metaxas,et al.  Limits of Electoral Predictions Using Twitter , 2011, ICWSM.

[48]  A. J. Morales,et al.  Characterizing and modeling an electoral campaign in the context of Twitter: 2011 Spanish Presidential Election as a case study , 2012, Chaos.

[49]  Trevor Cohn,et al.  A user-centric model of voting intention from Social Media , 2013, ACL.

[50]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[51]  Murphy Choy,et al.  US Presidential Election 2012 Prediction using Census Corrected Twitter Model , 2012, ArXiv.

[52]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[53]  P. Biemer Total Survey Error: Design, Implementation, and Evaluation , 2010 .

[54]  Scott Andrew Golder Social Science with Social Media , 2017 .

[55]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[56]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[57]  F. Conrad,et al.  Social Media Analyses for Social Measurement. , 2016, Public opinion quarterly.

[58]  Antoine Boutet,et al.  What's in Your Tweets? I Know Who You Supported in the UK 2010 General Election , 2012, ICWSM.

[59]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[60]  Fernando Diaz,et al.  Integration of news content into web results , 2009, WSDM '09.

[61]  Panagiotis Takis Metaxas,et al.  What Edited Retweets Reveal about Online Political Discourse , 2011, Analyzing Microtext.

[62]  Daniel Gayo-Avello,et al.  Don't turn social media into another 'Literary Digest' poll , 2011, Commun. ACM.

[63]  Filippo Menczer,et al.  The Digital Evolution of Occupy Wall Street , 2013, PloS one.

[64]  Lars Backstrom,et al.  ePluribus: Ethnicity on Social Networks , 2010, ICWSM.

[65]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[66]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[67]  Daniel Gayo-Avello,et al.  No, You Cannot Predict Elections with Twitter , 2012, IEEE Internet Comput..

[68]  JungherrAndreas,et al.  Why the Pirate Party Won the German Election of 2009 or The Trouble With Predictions , 2012 .

[69]  Susan T. Dumais,et al.  Towards Supporting Search over Trending Events with Social Media , 2013, ICWSM.

[70]  Ben Sayre,et al.  Mapping the Political Twitterverse: Candidates and Their Followers in the Midterms , 2021, ICWSM.

[71]  Giuseppe Porro,et al.  Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France , 2013, New Media Soc..

[72]  Steve Locy Guides: Public Opinion Polls: Pew Research Center , 2012 .

[73]  Anders Olof Larsson,et al.  Methodological and Ethical Challenges Associated with Large-scale Analyses of Online Political Communication , 2013 .

[74]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.