Using unsupervised information to improve semi-supervised tweet sentiment classification

Abstract Supervised algorithms require a set of representative labeled data for building classification models. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses both labeled and unlabeled data in the training process and is particularly useful in applications such as tweet sentiment analysis, where a large amount of unlabeled data is available. Semi-supervised learning for tweet sentiment analysis, although quite appealing, is relatively new. We propose a semi-supervised learning framework that combines unsupervised information, captured from a similarity matrix constructed from unlabeled data, with a classifier. Our motivation is that such a similarity matrix is a powerful knowledge-discovery tool that can help classify unlabeled tweet sets. Our framework makes use of the well-known Self-training algorithm to induce a better tweet sentiment classifier. Experimental results in real-world datasets demonstrate that the proposed framework can improve the accuracy of tweet sentiment analysis.

[1]  Xueqi Cheng,et al.  Adaptive co-training SVM for sentiment classification on tweets , 2013, CIKM.

[2]  John Carroll,et al.  Weakly supervised techniques for domain-independent sentiment classification , 2009, TSA@CIKM.

[3]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[4]  Long Jiang,et al.  User-level sentiment analysis incorporating social networks , 2011, KDD.

[5]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[6]  Likun Qiu,et al.  SELC: a self-supervised model for sentiment classification , 2009, CIKM.

[7]  Joydeep Ghosh,et al.  An Optimization Framework for Combining Ensembles of Classifiers and Clusterers with Applications to Nontransductive Semisupervised Learning and Transfer Learning , 2014, TKDD.

[8]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[9]  Mário J. Silva,et al.  Clues for detecting irony in user-generated contents: oh...!! it's "so easy" ;-) , 2009, TSA@CIKM.

[10]  Joydeep Ghosh,et al.  C 3E: A Framework for Combining Ensembles of Classifiers and Clusterers , 2011, MCS.

[11]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[12]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[13]  Aoying Zhou,et al.  SentiView: Sentiment Analysis and Visualization for Internet Popular Topics , 2013, IEEE Transactions on Human-Machine Systems.

[14]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ning Yu,et al.  Exploring Co‐training strategies for opinion detection , 2014, J. Assoc. Inf. Sci. Technol..

[16]  M. Thelwall,et al.  Sentiment Strength Detection in Short Informal Text 1 , 2010 .

[17]  Flavius Frasincar,et al.  Polarity classification using structure-based vector representations of text , 2015, Decis. Support Syst..

[18]  Ellen Riloff,et al.  Bootstrapped Learning of Emotion Hashtags #hashtags4you , 2013, WASSA@NAACL-HLT.

[19]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[20]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[21]  Jee-Hyong Lee,et al.  Competitive Self-Training technique for sentiment analysis in mass social media , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[22]  Estevam R. Hruschka,et al.  Combining Classification and Clustering for Tweet Sentiment Analysis , 2014, 2014 Brazilian Conference on Intelligent Systems.

[23]  Daniel Dajun Zeng,et al.  Twitter Sentiment Analysis: A Bootstrap Ensemble Framework , 2013, 2013 International Conference on Social Computing.

[24]  Estevam R. Hruschka,et al.  Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[25]  Joydeep Ghosh,et al.  A differential evolution algorithm to optimise the combination of classifier and cluster ensembles , 2015, Int. J. Bio Inspired Comput..

[26]  Lei Zhang,et al.  Identifying Noun Product Features that Imply Opinions , 2011, ACL.

[27]  Renata Vieira,et al.  Some clues on irony detection in tweets , 2013, WWW '13 Companion.

[28]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[29]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[30]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[31]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[32]  Clement T. Yu,et al.  The effect of negation on sentiment analysis and retrieval effectiveness , 2009, CIKM.

[33]  Bin Tang,et al.  Document Representation and Dimension Reduction for Text Clustering , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[34]  John Carroll,et al.  Unsupervised Classification of Sentiment and Objectivity in Chinese Text , 2008, IJCNLP.

[35]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[36]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[37]  Man Lan,et al.  ECNU: Expression- and Message-level Sentiment Orientation Classification in Twitter Using Multiple Effective Features , 2014, *SEMEVAL.

[38]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[39]  Renata Vieira,et al.  Pathways for irony detection in tweets , 2014, SAC.

[40]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[41]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[42]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[43]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[44]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[45]  Xueqi Cheng,et al.  TASC:Topic-Adaptive Sentiment Classification on Dynamic Tweets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[46]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[47]  Lee Becker,et al.  AVAYA: Sentiment Analysis on Twitter with Self-Training and Polarity Lexicon Expansion , 2013, *SEMEVAL.

[48]  Estevam R. Hruschka,et al.  Biocom Usp: Tweet Sentiment Analysis with Adaptive Boosting Ensemble , 2014, SemEval@COLING.

[49]  Saif Mohammad,et al.  NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets , 2014, SemEval@COLING.

[50]  David E. Losada,et al.  An empirical study of sentence features for subjectivity and polarity classification , 2014, Inf. Sci..

[51]  Danny Chiang Choon Poo,et al.  Sentix: An Aspect and Domain Sensitive Sentiment Lexicon , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[52]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[53]  Uzay Kaymak,et al.  Exploiting Emoticons in Polarity Classification of Text , 2015, J. Web Eng..

[54]  Enrique Herrera-Viedma,et al.  Sentiment analysis: A review and comparative analysis of web services , 2015, Inf. Sci..

[55]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[56]  Danny Chiang Choon Poo,et al.  Aspect-Based Twitter Sentiment Classification , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[57]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[58]  Tomoko Ohkuma,et al.  TeamX: A Sentiment Analyzer with Enhanced Lexicon Mapping and Weighting Scheme for Unbalanced Data , 2014, *SEMEVAL.

[59]  Shie Mannor,et al.  More Is Better: Large Scale Partially-supervised Sentiment Classification - Appendix , 2012, ArXiv.

[60]  Uzay Kaymak,et al.  Determining negation scope and strength in sentiment analysis , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[61]  Eduardo R. Hruschka,et al.  Towards improving cluster-based feature selection with a simplified silhouette filter , 2011, Inf. Sci..

[62]  Ning Xu,et al.  Co-training and visualizing sentiment evolvement for tweet events , 2013, WWW '13 Companion.

[63]  Marc Cheong,et al.  A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter , 2011, Inf. Syst. Frontiers.

[64]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[65]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[66]  Jimmy J. Lin,et al.  Large-scale machine learning at twitter , 2012, SIGMOD Conference.

[67]  Dominik Slezak,et al.  Processing and mining complex data streams , 2014, Inf. Sci..

[68]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[69]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[70]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[71]  Ali Selamat,et al.  Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples , 2015, Inf. Sci..

[72]  Chu-Ren Huang,et al.  Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification , 2010, ACL.

[73]  Xiaojin Zhu,et al.  New directions in semi-supervised learning , 2010 .

[74]  Bing Xiang,et al.  Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training , 2014, ACL.

[75]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[76]  Nancy Ide,et al.  Distant Supervision for Emotion Classification with Discrete Binary Values , 2013, CICLing.

[77]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[78]  Sandra Kübler,et al.  Filling the Gap: Semi-Supervised Learning for Opinion Detection Across Domains , 2011, CoNLL.

[79]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[80]  Johan Bos,et al.  *SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) , 2012 .

[81]  Sam Clark,et al.  SwatCS: Combining simple classifiers with estimated accuracy , 2013, *SEMEVAL.

[82]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[83]  Jens Grivolla,et al.  FBM: Combining lexicon-based ML and heuristics for Social Media Polarities , 2013, *SEMEVAL.

[84]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[85]  Zhiguang Liu,et al.  Reserved Self-training: A Semi-supervised Sentiment Classification Method for Chinese Microblogs , 2013, IJCNLP.

[86]  Wesley Baugh,et al.  bwbaugh : Hierarchical sentiment analysis with partial self-training , 2013, *SEMEVAL.

[87]  Luís Torgo,et al.  Guided Self Training for Sentiment Classification , 2011 .

[88]  Maite Taboada,et al.  A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora , 2015, Lang. Resour. Evaluation.

[89]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[90]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[91]  Elisabetta Fersini,et al.  Enhance User-Level Sentiment Analysis on Microblogs with Approval Relations , 2013, AI*IA.