Opinion polarity detection in Twitter data combining shrinkage regression and topic modeling

We propose a method to analyze public opinion about political issues online by automatically detecting polarity in Twitter data. Previous studies have focused on the polarity classification of individual tweets. However, to understand the direction of public opinion on a political issue, it is important to analyze the degree of polarity on the major topics at the center of the discussion in addition to the individual tweets. The first stage of the proposed method detects polarity in tweets using the Lasso and Ridge models of shrinkage regression. The models are beneficial in that the regression results provide sentiment scores for the terms that appear in tweets. The second stage identifies the major topics via a latent Dirichlet analysis (LDA) topic model and estimates the degree of polarity on the LDA topics using term sentiment scores. To the best of our knowledge, our study is the first to predict the polarities of public opinion on topics in this manner. We conducted an experiment on a mayoral election in Seoul, South Korea and compared the total detection accuracy of the regression models with five support vector machine (SVM) models with different numbers of input terms selected by a feature selection algorithm. The results indicated that the performance of the Ridge model was approximately 7% higher on average than that of the SVM models. Additionally, the degree of polarity on the LDA topics estimated using the proposed method was compared with actual public opinion responses. The results showed that the polarity detection accuracy of the Lasso model was 83%, indicating that the proposed method was valid in most cases.

[1]  Zili Zhang,et al.  Sentiment classification of Internet restaurant reviews written in Cantonese , 2011, Expert Syst. Appl..

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[5]  Hiroya Takamura,et al.  Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees , 2005, PAKDD.

[6]  S. Galam,et al.  Towards a theory of collective phenomena: Consensus and attitude changes in groups , 1991 .

[7]  S. Galam,et al.  Sociophysics: A new approach of sociological collective behaviour. I. mean‐behaviour description of a strike , 1982, 2211.07041.

[8]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[9]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009 .

[10]  Luis Alfonso Ureña López,et al.  Experiments with SVM to classify opinions in different domains , 2011, Expert Syst. Appl..

[11]  Derrick L. Cogburn,et al.  From Networked Nominee to Networked Nation: Examining the Impact of Web 2.0 and Social Media on Political Participation and Civic Engagement in the 2008 Obama Campaign , 2011 .

[12]  Anders Olof Larsson,et al.  Studying political microblogging: Twitter users in the 2010 Swedish election campaign , 2012, New Media Soc..

[13]  Lei Zhang,et al.  Identifying Noun Product Features that Imply Opinions , 2011, ACL.

[14]  Eugénio C. Oliveira,et al.  Tokenizing micro-blogging messages using a text classification approach , 2010, AND '10.

[15]  Katarzyna Wegrzyn-Wolska,et al.  Tweets mining for French Presidential Election , 2012, 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN).

[16]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[17]  Danushka Bollegala,et al.  Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus , 2013, IEEE Transactions on Knowledge and Data Engineering.

[18]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Jane Yung-jen Hsu,et al.  Building a Concept-Level Sentiment Dictionary Based on Commonsense Knowledge , 2013, IEEE Intelligent Systems.

[21]  Uzay Kaymak,et al.  Exploiting emoticons in sentiment analysis , 2013, SAC '13.

[22]  Jitendra Kumar,et al.  Sentiment Classification: An Approach for Indian Language Tweets Using Decision Tree , 2015, MIKE.

[23]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[24]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[25]  Kresse,et al.  Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. , 1996, Physical review. B, Condensed matter.

[26]  Hsinchun Chen,et al.  A Lexicon-Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews , 2010, IEEE Intelligent Systems.

[27]  Jong-Hyeok Lee,et al.  Improving Opinion Retrieval Based on Query-Specific Sentiment Lexicon , 2009, ECIR.

[28]  Martin Pelikan,et al.  Bayesian Optimization Algorithm , 2005 .

[29]  Dongmei Zhang,et al.  An ensemble method for unbalanced sentiment classification , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[30]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[31]  M. Broersma,et al.  BETWEEN BROADCASTING POLITICAL MESSAGES AND INTERACTING WITH VOTERS , 2012 .

[32]  Matthew S. Gerber,et al.  Predicting crime using Twitter and kernel density estimation , 2014, Decis. Support Syst..

[33]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[34]  Hsinchun Chen,et al.  Affect Analysis of Web Forums and Blogs Using Correlation Ensembles , 2008, IEEE Transactions on Knowledge and Data Engineering.

[35]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[36]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[37]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[38]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[39]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[40]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .