Bi-view semi-supervised active learning for cross-lingual sentiment classification

Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.

[1]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[2]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[3]  Yong Yu,et al.  Cross-Lingual Sentiment Classification via Bi-view Non-negative Matrix Tri-Factorization , 2011, PAKDD.

[4]  Ion Muslea,et al.  Active Learning with Multiple Views , 2009, Encyclopedia of Data Warehousing and Mining.

[5]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[6]  Alexandra Balahur,et al.  Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis , 2014, Comput. Speech Lang..

[7]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[8]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[9]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[10]  Luis Alfonso Ureña López,et al.  Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches , 2013, Expert Syst. Appl..

[11]  Xiaojun Wan,et al.  Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews , 2011, CL.

[12]  Kongqiao Wang,et al.  Active learning for image retrieval with Co-SVM , 2007, Pattern Recognit..

[13]  Sam Kwong,et al.  Inconsistency-based active learning for support vector machines , 2012, Pattern Recognit..

[14]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[15]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[16]  Jingbo Zhu,et al.  Active Learning With Sampling by Uncertainty and Density for Data Annotations , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Iñaki Inza,et al.  Approaching Sentiment Analysis by using semi-supervised learning of multi-dimensional classifiers , 2012, Neurocomputing.

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[20]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[21]  Byoung-Tak Zhang,et al.  Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information , 2004, Inf. Process. Manag..

[22]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[23]  Patricio Martínez-Barco,et al.  Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments , 2012, Decis. Support Syst..

[24]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[25]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[26]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[27]  Lei Shi,et al.  Cross Language Text Classification by Model Translation and Semi-Supervised Learning , 2010, EMNLP.

[28]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[29]  Gérard Dray,et al.  Web opinion mining: how to extract opinions from blogs? , 2008, CSTST.

[30]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[31]  Zhang Zhang,et al.  Cross-lingual text classification with model translation and document translation , 2012, ACM-SE '12.

[32]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[33]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[34]  Benno Stein,et al.  Cross-Lingual Adaptation Using Structural Correspondence Learning , 2010, TIST.

[35]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[36]  Guodong Zhou,et al.  Active Learning for Imbalanced Sentiment Classification , 2012, EMNLP.

[37]  Ishwar K. Sethi,et al.  Confidence-based active learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.