Cross-domain sentiment classification with word embeddings and canonical correlation analysis

A common approach for automatic sentiment classification is using classifiers trained on labeled text data (reviews, blog posts etc.) to predict the sentiment polarity of new data. Because people express sentiment differently in different domains, this approach requires annotated corpora for each domain. However, annotating data for every domain of interest is laborious and impractical. In this paper, we address the domain adaptation problem for sentiment classification. We explore the effect of generic methods for feature learning and feature subspace mapping, namely word embeddings and canonical correlation analysis (CCA), on cross-domain sentiment classifiers. We show that by using only such rather generic methods, it is possible to get results very competitive with those of sophisticated methods specially developed for the considered problem. An advantage of using word embeddings and CCA is their availability out-of-the-box, which is important for the applicability of the proposed method. Experiments on a widely used benchmark dataset shows that both word embeddings and CCA contribute to accuracy improvement and their combination provides the best results.

[1]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Young-Bum Kim,et al.  New Transfer Learning Techniques for Disparate Label Sets , 2015, ACL.

[4]  Mike Thelwall,et al.  Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification , 2012, EMNLP.

[5]  Yulia Tsvetkov,et al.  Sparse Overcomplete Word Vector Representations , 2015, ACL.

[6]  Harith Alani,et al.  Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification , 2011, ACL.

[7]  Manaal Faruqui,et al.  Non-distributional Word Vector Representations , 2015, ACL.

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[10]  Ngo Xuan Bach,et al.  An empirical study on sentiment analysis for Vietnamese , 2014, 2014 International Conference on Advanced Technologies for Communications (ATC 2014).

[11]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[14]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[15]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[16]  Xinhui Tu,et al.  Cross-domain sentiment classification via topical correspondence transfer , 2015, Neurocomputing.

[17]  Ngo Xuan Bach,et al.  Knowledge Based and Intelligent Information and Engineering Systems Leveraging User Ratings for Resource-Poor Sentiment Classification , 2015 .

[18]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[19]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.

[20]  Young-Bum Kim,et al.  Compact Lexicon Selection with Spectral Methods , 2015, ACL.

[21]  Rui Xia,et al.  Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification , 2013, IEEE Intelligent Systems.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[24]  Danushka Bollegala,et al.  Cross-Domain Sentiment Classification Using Sentiment Sensitive Embeddings , 2016, IEEE Transactions on Knowledge and Data Engineering.

[25]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[26]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[27]  Justin Zhijun Zhan,et al.  Sentiment analysis using product review data , 2015, Journal of Big Data.

[28]  Young-Bum Kim,et al.  Part-of-speech Taggers for Low-resource Languages using CCA Features , 2015, EMNLP.

[29]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[30]  Benjamin Van Durme,et al.  Multiview LSA: Representation Learning via Generalized CCA , 2015, NAACL.

[31]  John Blitzer,et al.  Co-Training for Domain Adaptation , 2011, NIPS.

[32]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[33]  Fangzhao Wu,et al.  Sentiment Domain Adaptation with Multiple Sources , 2016, ACL.