论文信息 - Using unsupervised information to improve semi-supervised tweet sentiment classification

Using unsupervised information to improve semi-supervised tweet sentiment classification

Abstract Supervised algorithms require a set of representative labeled data for building classification models. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses both labeled and unlabeled data in the training process and is particularly useful in applications such as tweet sentiment analysis, where a large amount of unlabeled data is available. Semi-supervised learning for tweet sentiment analysis, although quite appealing, is relatively new. We propose a semi-supervised learning framework that combines unsupervised information, captured from a similarity matrix constructed from unlabeled data, with a classifier. Our motivation is that such a similarity matrix is a powerful knowledge-discovery tool that can help classify unlabeled tweet sets. Our framework makes use of the well-known Self-training algorithm to induce a better tweet sentiment classifier. Experimental results in real-world datasets demonstrate that the proposed framework can improve the accuracy of tweet sentiment analysis.

[1] Xueqi Cheng,et al. Adaptive co-training SVM for sentiment classification on tweets , 2013, CIKM.

[2] John Carroll,et al. Weakly supervised techniques for domain-independent sentiment classification , 2009, TSA@CIKM.

[3] Guodong Zhou,et al. Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[4] Long Jiang,et al. User-level sentiment analysis incorporating social networks , 2011, KDD.

[5] Joydeep Ghosh,et al. Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[6] Likun Qiu,et al. SELC: a self-supervised model for sentiment classification , 2009, CIKM.

[7] Joydeep Ghosh,et al. An Optimization Framework for Combining Ensembles of Classifiers and Clusterers with Applications to Nontransductive Semisupervised Learning and Transfer Learning , 2014, TKDD.

[8] Nina Wacholder,et al. Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[9] Mário J. Silva,et al. Clues for detecting irony in user-generated contents: oh...!! it's "so easy" ;-) , 2009, TSA@CIKM.

[10] Joydeep Ghosh,et al. C 3E: A Framework for Combining Ensembles of Classifiers and Clusterers , 2011, MCS.

[11] Bernhard Schölkopf,et al. Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[12] Preslav Nakov,et al. SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[13] Aoying Zhou,et al. SentiView: Sentiment Analysis and Visualization for Internet Popular Topics , 2013, IEEE Transactions on Human-Machine Systems.

[14] Zhi-Hua Zhou,et al. Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Ning Yu,et al. Exploring Co‐training strategies for opinion detection , 2014, J. Assoc. Inf. Sci. Technol..

[16] M. Thelwall,et al. Sentiment Strength Detection in Short Informal Text 1 , 2010 .

[17] Flavius Frasincar,et al. Polarity classification using structure-based vector representations of text , 2015, Decis. Support Syst..

[18] Ellen Riloff,et al. Bootstrapped Learning of Emotion Hashtags #hashtags4you , 2013, WASSA@NAACL-HLT.

[19] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[20] Koby Crammer,et al. Adaptive regularization of weight vectors , 2009, Machine Learning.

[21] Jee-Hyong Lee,et al. Competitive Self-Training technique for sentiment analysis in mass social media , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[22] Estevam R. Hruschka,et al. Combining Classification and Clustering for Tweet Sentiment Analysis , 2014, 2014 Brazilian Conference on Intelligent Systems.

[23] Daniel Dajun Zeng,et al. Twitter Sentiment Analysis: A Bootstrap Ensemble Framework , 2013, 2013 International Conference on Social Computing.

[24] Estevam R. Hruschka,et al. Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[25] Joydeep Ghosh,et al. A differential evolution algorithm to optimise the combination of classifier and cluster ensembles , 2015, Int. J. Bio Inspired Comput..

[26] Lei Zhang,et al. Identifying Noun Product Features that Imply Opinions , 2011, ACL.

[27] Renata Vieira,et al. Some clues on irony detection in tweets , 2013, WWW '13 Companion.

[28] Owen Rambow,et al. Sentiment Analysis of Twitter Data , 2011 .

[29] Jonathon Read,et al. Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[30] Mike Thelwall,et al. Sentiment in short strength detection informal text , 2010 .

[31] Amit P. Sheth,et al. Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[32] Clement T. Yu,et al. The effect of negation on sentiment analysis and retrieval effectiveness , 2009, CIKM.

[33] Bin Tang,et al. Document Representation and Dimension Reduction for Text Clustering , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[34] John Carroll,et al. Unsupervised Classification of Sentiment and Objectivity in Chinese Text , 2008, IJCNLP.

[35] Xiaojin Zhu,et al. Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[36] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[37] Man Lan,et al. ECNU: Expression- and Message-level Sentiment Orientation Classification in Twitter Using Multiple Effective Features , 2014, *SEMEVAL.

[38] Bernard J. Jansen,et al. Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[39] Renata Vieira,et al. Pathways for irony detection in tweets , 2014, SAC.

[40] Preslav Nakov,et al. SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[41] Harith Alani,et al. Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[42] Huan Liu,et al. Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[43] Xiaojun Wan,et al. Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[44] Jun-Ming Xu,et al. Learning from Bullying Traces in Social Media , 2012, NAACL.

[45] Xueqi Cheng,et al. TASC:Topic-Adaptive Sentiment Classification on Dynamic Tweets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[46] Xiaojin Zhu,et al. --1 CONTENTS , 2006 .

[47] Lee Becker,et al. AVAYA: Sentiment Analysis on Twitter with Self-Training and Polarity Lexicon Expansion , 2013, *SEMEVAL.

[48] Estevam R. Hruschka,et al. Biocom Usp: Tweet Sentiment Analysis with Adaptive Boosting Ensemble , 2014, SemEval@COLING.

[49] Saif Mohammad,et al. NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets , 2014, SemEval@COLING.

[50] David E. Losada,et al. An empirical study of sentence features for subjectivity and polarity classification , 2014, Inf. Sci..

[51] Danny Chiang Choon Poo,et al. Sentix: An Aspect and Domain Sensitive Sentiment Lexicon , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[52] Rui Xia,et al. Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[53] Uzay Kaymak,et al. Exploiting Emoticons in Polarity Classification of Text , 2015, J. Web Eng..

[54] Enrique Herrera-Viedma,et al. Sentiment analysis: A review and comparative analysis of web services , 2015, Inf. Sci..

[55] Saif Mohammad,et al. #Emotional Tweets , 2012, *SEMEVAL.

[56] Danny Chiang Choon Poo,et al. Aspect-Based Twitter Sentiment Classification , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[57] Nada Lavrac,et al. Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[58] Tomoko Ohkuma,et al. TeamX: A Sentiment Analyzer with Enhanced Lexicon Mapping and Weighting Scheme for Unbalanced Data , 2014, *SEMEVAL.

[59] Shie Mannor,et al. More Is Better: Large Scale Partially-supervised Sentiment Classification - Appendix , 2012, ArXiv.

[60] Uzay Kaymak,et al. Determining negation scope and strength in sentiment analysis , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[61] Eduardo R. Hruschka,et al. Towards improving cluster-based feature selection with a simplified silhouette filter , 2011, Inf. Sci..

[62] Ning Xu,et al. Co-training and visualizing sentiment evolvement for tweet events , 2013, WWW '13 Companion.

[63] Marc Cheong,et al. A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter , 2011, Inf. Syst. Frontiers.

[64] Lei Zhang,et al. Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[65] Joydeep Ghosh,et al. Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[66] Jimmy J. Lin,et al. Large-scale machine learning at twitter , 2012, SIGMOD Conference.

[67] Dominik Slezak,et al. Processing and mining complex data streams , 2014, Inf. Sci..

[68] Brendan T. O'Connor,et al. Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[69] Maria-Florina Balcan,et al. A discriminative model for semi-supervised learning , 2010, J. ACM.

[70] David A. Shamma,et al. Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[71] Ali Selamat,et al. Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples , 2015, Inf. Sci..

[72] Chu-Ren Huang,et al. Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification , 2010, ACL.

[73] Xiaojin Zhu,et al. New directions in semi-supervised learning , 2010 .

[74] Bing Xiang,et al. Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training , 2014, ACL.

[75] Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[76] Nancy Ide,et al. Distant Supervision for Emotion Classification with Discrete Binary Values , 2013, CICLing.

[77] Xiaojin Zhu,et al. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[78] Sandra Kübler,et al. Filling the Gap: Semi-Supervised Learning for Opinion Detection Across Domains , 2011, CoNLL.

[79] Maite Taboada,et al. Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[80] Johan Bos,et al. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) , 2012 .

[81] Sam Clark,et al. SwatCS: Combining simple classifiers with estimated accuracy , 2013, *SEMEVAL.

[82] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.

[83] Jens Grivolla,et al. FBM: Combining lexicon-based ML and heuristics for Social Media Polarities , 2013, *SEMEVAL.

[84] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[85] Zhiguang Liu,et al. Reserved Self-training: A Semi-supervised Sentiment Classification Method for Chinese Microblogs , 2013, IJCNLP.

[86] Wesley Baugh,et al. bwbaugh : Hierarchical sentiment analysis with partial self-training , 2013, *SEMEVAL.

[87] Luís Torgo,et al. Guided Self Training for Sentiment Classification , 2011 .

[88] Maite Taboada,et al. A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora , 2015, Lang. Resour. Evaluation.

[89] Ari Rappoport,et al. Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[90] Saif Mohammad,et al. NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[91] Elisabetta Fersini,et al. Enhance User-Level Sentiment Analysis on Microblogs with Approval Relations , 2013, AI*IA.