Cross-Lingual Sentiment Classification via Bi-view Non-negative Matrix Tri-Factorization

Recently the sentiment classification problem interests the researchers over the world, but most sentiment corpora are in English, which limits the research progress on sentiment classification in other languages. Cross-lingual sentiment classification aims to use annotated sentiment corpora in one language (e.g. English) as training data, to predict the sentiment polarity of the data in another language (e.g. Chinese). In this paper, we design a bi-view non-negative matrix tri-factorization (BNMTF) model for the cross-lingual sentiment classification problem. We employ machine translation service so that both training and test data is able to have two representation, one in source language and the other in target language. Our BNMTF model is derived from the non-negative matrix tri-factorization models in both languages in order to make more accurate prediction. Our BNMTF model has three main advantages: (1) combining the information from two views (2) incorporating the lexical knowledge and training document label knowledge (3) adding information from test documents. Experimental results show the effectiveness of our BNMTF model, which can outperform other baseline approaches to cross-lingual sentiment classification.

[1]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[2]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[3]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[4]  Qiang Yang,et al.  Can chinese web pages be classified with english data source? , 2008, WWW.

[5]  Douglas W. Oard,et al.  Cross-language text classification , 2005, SIGIR '05.

[6]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[7]  Andrei Z. Broder,et al.  Cross-language query classification using web search for exogenous knowledge , 2009, WSDM '09.

[8]  Tao Li,et al.  A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge , 2009, ACL.

[9]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[10]  Kumiko Tanaka-Ishii,et al.  Multilingual Spectral Clustering Using Document Similarity Propagation , 2009, EMNLP.

[11]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[12]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[13]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[14]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[15]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[16]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[17]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[18]  Núria Bel,et al.  Cross-Lingual Text Categorization , 2003, ECDL.