Combining Classification and Clustering for Tweet Sentiment Analysis

The goal of sentiment analysis is to determine opinions, emotions, and attitudes presented in source material. In tweet sentiment analysis, opinions in messages can be typically categorized as positive or negative. To classify them, researchers have been using traditional classifiers like Naive Bayes, Maximum Entropy, and Support Vector Machines (SVM). In this paper, we show that a SVM classifier combined with a cluster ensemble can offer better classification accuracies than a stand-alone SVM. In our study, we employed an algorithm, named C3E-SL, capable to combine classifier and cluster ensembles. This algorithm can refine tweet classifications from additional information provided by clusterers, assuming that similar instances from the same clusters are more likely to share the same class label. The resulting classifier has shown to be competitive with the best results found so far in the literature, thereby suggesting that the studied approach is promising for tweet sentiment classification.

[1]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Joydeep Ghosh,et al.  An Optimization Framework for Combining Ensembles of Classifiers and Clusterers with Applications to Nontransductive Semisupervised Learning and Transfer Learning , 2014, TKDD.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Nong Sang,et al.  Using clustering analysis to improve semi-supervised classification , 2013, Neurocomputing.

[6]  Joydeep Ghosh,et al.  C 3E: A Framework for Combining Ensembles of Classifiers and Clusterers , 2011, MCS.

[7]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[8]  David A. Shamma,et al.  Tweet the debates: understanding community annotation of uncollected sources , 2009, WSM@MM.

[9]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[10]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[11]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[12]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[13]  P. Ekman Emotion in the human face , 1982 .

[14]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[15]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[16]  Jason Baldridge,et al.  Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph , 2011, ULNLP@EMNLP.

[17]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[18]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[19]  Rainer Schrader,et al.  Sentiment Polarity Classification Using Statistical Data Compression Models , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[20]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[21]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[22]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[23]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[24]  Patricio Martínez-Barco,et al.  Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments , 2012, Decis. Support Syst..

[25]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[27]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[28]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[29]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[30]  Qiang Qian,et al.  Simultaneous clustering and classification over cluster structure representation , 2012, Pattern Recognit..

[31]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[32]  Yizhou Sun,et al.  Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models , 2009, NIPS.

[33]  Huanhuan Chen,et al.  Semisupervised Classification With Cluster Regularization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Daoqiang Zhang,et al.  A Multiobjective Simultaneous Learning Framework for Clustering and Classification , 2010, IEEE Transactions on Neural Networks.

[35]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[36]  Ludmila I. Kuncheva,et al.  Experimental Comparison of Cluster Ensemble Methods , 2006, 2006 9th International Conference on Information Fusion.

[37]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[38]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[39]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[40]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[41]  Vasudeva Varma,et al.  Mining Sentiments from Tweets , 2012, WASSA@ACL.

[42]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[43]  Joydeep Ghosh,et al.  A differential evolution algorithm to optimise the combination of classifier and cluster ensembles , 2015, Int. J. Bio Inspired Comput..

[44]  K. V. Price,et al.  Differential evolution: a fast and simple numerical optimizer , 1996, Proceedings of North American Fuzzy Information Processing.

[45]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[46]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[47]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[48]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[49]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[50]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[51]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[52]  Alekh Agarwal Sentiment Analysis : A New Approach for Effective Use of Linguistic Knowledge and Exploiting Similarities in a Set of Documents to be Classified . , 2005 .

[53]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[54]  Ashfaqur Rahman,et al.  Cluster-based ensemble of classifiers , 2013, Expert Syst. J. Knowl. Eng..

[55]  Craig MacDonald,et al.  An effective statistical approach to blog post opinion retrieval , 2008, CIKM '08.

[56]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[57]  Marc Cheong,et al.  A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter , 2011, Inf. Syst. Frontiers.