Predicting Poll Trends Using Twitter and Multivariate Time-Series Classification

Social media outlets, such as Twitter, provide invaluable information for understanding the social and political climate surrounding particular issues. Millions of people who vary in age, social class, and political beliefs come together in conversation. However, this information poses challenges to making inferences from these tweets. Using the tweets from the 2016 U.S. Presidential campaign, one main research question is addressed in this work. That is, can accurate predictions be made detecting changes in a political candidate’s poll score trends utilizing tweets created during their campaign? The novelty of this work is that we formulate the problem as a multivariate time-series classification problem, which fits the temporal nature of tweets, rather than as a traditional attribute-based classification. Features that represent various aspects of support for (or against) a candidate are tracked on an hour-by-hour basis. Together these form multivariate time-series. One commonly used approach to this problem is based on the majority voting scheme. This method assumes the univariate time-series from different features have equal importance. To alleviate this issue a weighted shapelet transformation model is proposed. Extensive experiments on over 12 million tweets between November 2015 and January 2016 related to the four primary candidates (Bernie Sanders, Hillary Clinton, Donald Trump and Ted Cruz) indicate that the multivariate time-series approach outperforms traditional attribute-based approaches.

[1]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[2]  Johan Bos,et al.  Predicting the 2011 Dutch Senate Election Results with Twitter , 2012 .

[3]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[4]  Mohamed F. Ghalwash,et al.  False alarm suppression in early prediction of cardiac arrhythmia , 2015, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE).

[5]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[6]  Todd Graham,et al.  New platform, old habits? Candidates’ use of Twitter during the 2010 British and Dutch general election campaigns , 2016, New Media Soc..

[7]  Viktor K. Prasanna,et al.  Extracting discriminative shapelets from heterogeneous sensor data , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[8]  Philip S. Yu,et al.  Extracting Interpretable Features for Early Classification on Time Series , 2011, SDM.

[9]  Lei Shi,et al.  Predicting US Primary Elections with Twitter , 2012 .

[10]  Anders Olof Larsson,et al.  Studying political microblogging: Twitter users in the 2010 Swedish election campaign , 2012, New Media Soc..

[11]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[12]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[13]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[14]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[15]  Mohamed F. Ghalwash,et al.  Utilizing temporal patterns for estimating uncertainty in interpretable early decision making , 2014, KDD.

[16]  Lars Schmidt-Thieme,et al.  Learning time-series shapelets , 2014, KDD.

[17]  Daniel Gayo-Avello,et al.  A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data , 2012, ArXiv.