Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models.

[1]  Zhongfei Zhang,et al.  Structural Correspondence Learning for Cross-Lingual Sentiment Classification with One-to-Many Mappings , 2016, AAAI.

[2]  Saif Mohammad,et al.  Sentiment after Translation: A Case-Study on Arabic Social Media Posts , 2015, NAACL.

[3]  Xiao Zhang,et al.  A fuzzy rough set-based feature selection method using representative instances , 2018, Knowl. Based Syst..

[4]  Xiaojun Wan,et al.  Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews , 2011, CL.

[5]  Peng Zhang,et al.  Cross-lingual sentiment classification: Similarity discovery plus training data adjustment , 2016, Knowl. Based Syst..

[6]  Yihao Zhang,et al.  Semi-supervised learning combining co-training with active learning , 2014, Expert Syst. Appl..

[7]  Mykola Pechenizkiy,et al.  Cross-lingual polarity detection with machine translation , 2013, WISDOM '13.

[8]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[9]  Mohamed S. Kamel,et al.  A Fast Greedy Algorithm for Generalized Column Subset Selection , 2013, ArXiv.

[10]  Nazlia Omar,et al.  Cross-Lingual Sentiment Classification from English to Arabic using Machine Translation , 2017 .

[11]  Bir Bhanu,et al.  Words alignment based on association rules for cross-domain sentiment classification , 2018, Frontiers of Information Technology & Electronic Engineering.

[12]  Jianfei Yu,et al.  Instance-based Domain Adaptation via Multiclustering Logistic Approximation , 2018, IEEE Intelligent Systems.

[13]  Alexandra Balahur,et al.  Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis , 2014, Comput. Speech Lang..

[14]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[15]  Yu Lei,et al.  Cross-Lingual Sentiment Relation Capturing for Cross-Lingual Sentiment Analysis , 2017, ECIR.

[16]  Alexandra Balahur,et al.  Multilingual Sentiment Analysis using Machine Translation? , 2012, WASSA@ACL.

[17]  Rui Xia,et al.  Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification , 2013, IEEE Intelligent Systems.

[18]  Mohamed Abdalla,et al.  Lowering the Cost of Improved Cross-Lingual Sentiment Analysis by , 2018 .

[19]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[20]  Jian Yang,et al.  Instance Selection and Instance Weighting for Cross-Domain Sentiment Classification via PU Learning , 2013, IJCAI.

[21]  Graeme Hirst,et al.  Cross-Lingual Sentiment Analysis Without (Good) Translation , 2017, IJCNLP.

[22]  Min Xiao,et al.  Semi-Supervised Matrix Completion for Cross-Lingual Text Classification , 2014, AAAI.

[23]  TingTing Li,et al.  A method on selecting reliable samples based on fuzziness in positive and unlabeled learning , 2019, ArXiv.

[24]  Jiesheng Wu,et al.  Chinese Micro-Blog Sentiment Analysis Based on Multiple Sentiment Dictionaries and Semantic Rule Sets , 2019, IEEE Access.

[25]  Wouter M. Kouw,et al.  A Review of Domain Adaptation without Target Labels , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jeremy Barnes,et al.  Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages , 2018, ACL.

[27]  Sarthak Jain,et al.  Cross Lingual Sentiment Analysis using Modified BRAE , 2015, EMNLP.

[28]  Tao Yu,et al.  Cross-lingual sentiment transfer with limited resources , 2018, Machine Translation.

[29]  Ali Selamat,et al.  Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples , 2015, Inf. Sci..

[30]  Ali Selamat,et al.  Bi-view semi-supervised active learning for cross-lingual sentiment classification , 2014, Inf. Process. Manag..

[31]  Pushpak Bhattacharyya,et al.  Solving Data Sparsity for Aspect Based Sentiment Analysis Using Cross-Linguality and Multi-Linguality , 2018, NAACL.

[32]  Shunxiang Zhang,et al.  Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary , 2018, Future Gener. Comput. Syst..

[33]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[34]  Rui Xia,et al.  Instance Weighting for Domain Adaptation via Trading off Sample Selection Bias and Variance , 2018 .

[35]  Karin Becker,et al.  Multilingual emotion classification using supervised learning: Comparative experiments , 2017, Inf. Process. Manag..

[36]  Benno Stein,et al.  Cross-Lingual Adaptation Using Structural Correspondence Learning , 2010, TIST.

[37]  Xiaojun Wan,et al.  Attention-based LSTM Network for Cross-Lingual Sentiment Classification , 2016, EMNLP.

[38]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.