A Comparative Study of Cross-Lingual Sentiment Classification

The task of sentiment classification relies heavily on sentiment resources, including annotated lexicons and corpus. However, the sentiment resources in different languages are imbalanced. In particular, many reliable English resources are available on the Web, while reliable Chinese resources are scarce till now. Cross-lingual sentiment classification is a promising way for addressing the above problem by leveraging only English resources for Chinese sentiment classification. In this study, we conduct a comparative study to explore the challenges of cross-lingual sentiment classification. Different schemes for cross-lingual sentiment classification based on two dimensions have been compared empirically. Lastly, we propose to combine the different individual schemes into an ensemble. Experiment results demonstrate the effectiveness of the proposed method.

[1]  Maosong Sun,et al.  Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[2]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[3]  Sabine Bergler,et al.  When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging , 2008, ACL.

[4]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[5]  ChengXiang Zhai,et al.  A two-stage approach to domain adaptation for statistical classifiers , 2007, CIKM '07.

[6]  Marco Maggini,et al.  An EM based training algorithm for cross-language text categorization , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[7]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[8]  Qiang Yang,et al.  Can chinese web pages be classified with english data source? , 2008, WWW.

[9]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[10]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[11]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[12]  Núria Bel,et al.  Cross-Lingual Text Categorization , 2003, ECDL.

[13]  Carlo Strapparava,et al.  Cross Language Text Categorization by Acquiring Multilingual Domain Models from Comparable Corpora , 2005, ParallelText@ACL.

[14]  Watanabe Hideo,et al.  Deeper Sentiment Analysis Using Machine Translation Technology , 2004, COLING.

[15]  Tom B. Y. Lai,et al.  Polarity Classification of Celebrity Coverage in the Chinese Press , 2005 .

[16]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[17]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[18]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[19]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[20]  Wen Shi,et al.  Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[21]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[22]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[23]  Mirella Lapata,et al.  Ensemble Methods for Unsupervised WSD , 2006, ACL.

[24]  Roberto Basili,et al.  Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims , 2003, Comput. Linguistics.

[25]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[26]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[27]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[28]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[29]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[30]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[31]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[32]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[33]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[34]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[35]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[36]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[37]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.