Cross-Lingual Sentiment Classification Based on Denoising Autoencoder

Sentiment classification system relies on high-quality emotional resources. However, these resources are imbalanced in different languages. The way of how to leverage rich labeled data of one language (source language) for the sentiment classification of resource-poor language (target language), namely cross-lingual sentiment classification (CLSC), becomes a focus topic. This paper utilizes rich English resources for Chinese sentiment classification. To eliminate the language gap between English and Chinese, this paper proposes a combination CLSC approach based on denoising autoencoder. First, two classifiers based on denoising autoencoder are learned respectively in English and Chinese views by using English corpus and English-to-Chinese corpus. Second, we classify Chinese test data and Chinese-to-English test data with the two classifiers trained in the two views. Last, the final sentiment classification results are obtained by the combination of the two results in two views. Experiments are carried out on NLP&CC 2013 CLSC dataset including book, DVD and music categories. The results show that our approach achieves the accuracy of 80.02%, which outperforms the current state-of-the-art systems.

[1]  Christopher Joseph Pal,et al.  Cross Lingual Adaptation: An Experiment on Sentiment Classifications , 2010, ACL.

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  Ting Liu,et al.  Learning Sentence Representation for Emotion Classification on Microblogs , 2013, NLPCC.

[5]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[6]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[7]  F. Sebastiani,et al.  Feature Selection and Negative Evidence in Automated Text Categorization [ Poster Paper ] , 2000 .

[8]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[9]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[10]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[11]  Xuanjing Huang,et al.  Natural Language Processing and Chinese Computing: 4th CCF Conference, NLPCC 2015, Nanchang, China, October 9-13, 2015, Proceedings , 2015, Lecture Notes in Computer Science.

[12]  Chu-Ren Huang,et al.  A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[13]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[14]  Chen Qian Cross-Language Sentiment Analysis Based on Parser , 2014 .

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Kam-Fai Wong,et al.  A Mixed Model for Cross Lingual Opinion Analysis , 2013, NLPCC.

[18]  Xiaolong Wang,et al.  Active Deep Networks for Semi-Supervised Sentiment Classification , 2010, COLING.

[19]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[20]  Chu-Ren Huang,et al.  Active Learning for Cross-Lingual Sentiment Classification , 2013, NLPCC.

[21]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[22]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Bing Qin,et al.  Sentiment Analysis: Sentiment Analysis , 2010 .

[24]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[25]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.