Influence factor based opinion mining of Twitter data using supervised learning

Social Networking portals have been widely used for expressing opinions in the public domain through internet based text messages and images. Among popular social networking portals, Twitter has been the point of attraction to several researchers in important areas like prediction of democratic electoral events, consumer brands, movie box office, stock market, popularity of celebrities etc. Sentiment analysis over Twitter offers a fast and efficient way of monitoring the public sentiment. In this paper, we introduce the novel approach of exploiting the user influence factor in order to predict the outcome of an election result. We also propose a hybrid approach of extracting opinion using direct and indirect features of Twitter data based on Support Vector Machines (SVM), Naive Bayes, Maximum Entropy and Artificial Neural Networks based supervised classifiers. We combined Principal Component Analysis (PCA) with SVM in an attempt to perform dimensionality reduction. This paper shows two different case studies of entirely different social scenarios, US Presidential Elections 2012 and Karnataka Assembly Elections 2013. We conclude the conditions under which Twitter may fail or succeed in predicting the outcome of elections. Experimental results demonstrate that Support Vector Machines outperform all other classifiers with maximum successful prediction accuracy of 88% in case of US Presidential Elections held in November 2012 and maximum prediction accuracy of 58% in case of Karnataka State Assembly Elections held in May 2013.

[1]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[2]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[3]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[4]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  Long-Sheng Chen,et al.  Journal of Informetrics , 2022 .

[7]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[8]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[9]  Ee-Peng Lim,et al.  Tweets and Votes: A Study of the 2011 Singapore General Election , 2012, 2012 45th Hawaii International Conference on System Sciences.

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[12]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[13]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[14]  Kuan-Ju Chen,et al.  Congressional Candidates’ Use of Twitter During the 2010 Midterm Elections: A Wasted Opportunity? , 2011 .

[15]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[16]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[17]  Huang Zou,et al.  Sentiment Classification Using Machine Learning Techniques with Syntax Features , 2015, 2015 International Conference on Computational Science and Computational Intelligence (CSCI).

[18]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[19]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[20]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[21]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[22]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[23]  Ravi Parikh,et al.  Sentiment Analysis of User-Generated Twitter Updates using Various Classification Techniques , 2009 .

[24]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.