Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews

The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only available English resources for Chinese sentiment classification. We first investigate several basic methods (including lexicon-based methods and corpus-based methods) for cross-lingual sentiment classification by simply leveraging machine translation services to eliminate the language gap, and then propose a bilingual co-training approach to make use of both the English view and the Chinese view based on additional unlabeled Chinese data. Experimental results on two test sets show the effectiveness of the proposed approach, which can outperform basic methods and transductive methods.

[1]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[2]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[3]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[4]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[5]  Tao Li,et al.  A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge , 2009, ACL.

[6]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[7]  Zhi-Hua Zhou,et al.  A New Analysis of Co-Training , 2010, ICML.

[8]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[9]  Jiahao Zhang,et al.  Sample cutting method for imbalanced text sentiment classification based on BRC , 2013, Knowl. Based Syst..

[10]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[11]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[12]  Vincent Ng,et al.  Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification , 2009, ACL.

[13]  Qiong Wu,et al.  SentiRank: Cross-Domain Graph Ranking for Sentiment Classification , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[14]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[15]  Benno Stein,et al.  Cross-Language Text Classification Using Structural Correspondence Learning , 2010, ACL.

[16]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[17]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[18]  James R. Curran,et al.  Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[19]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[20]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[21]  Massih-Reza Amini,et al.  A co-classification approach to learning from multilingual corpora , 2010, Machine Learning.

[22]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[23]  Claire Cardie,et al.  Weakly Supervised Natural Language Learning Without Redundant Views , 2003, NAACL.

[24]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[25]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[26]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[27]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[28]  Lei Shi,et al.  Cross Language Text Classification by Model Translation and Semi-Supervised Learning , 2010, EMNLP.

[29]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[30]  Núria Bel,et al.  Cross-Lingual Text Categorization , 2003, ECDL.

[31]  Carlo Strapparava,et al.  Cross Language Text Categorization by Acquiring Multilingual Domain Models from Comparable Corpora , 2005, ParallelText@ACL.

[32]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[33]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[34]  Anoop Sarkar,et al.  Applying Co-Training Methods to Statistical Parsing , 2001, NAACL.

[35]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[36]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[37]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[38]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[39]  Christopher Joseph Pal,et al.  Cross Lingual Adaptation: An Experiment on Sentiment Classifications , 2010, ACL.

[40]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[41]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[42]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[43]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[44]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[45]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[46]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[47]  Hiroshi Kanayama,et al.  Deeper Sentiment Analysis Using Machine Translation Technology , 2004, COLING.

[48]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[49]  ChengXiang Zhai,et al.  A two-stage approach to domain adaptation for statistical classifiers , 2007, CIKM '07.

[50]  Marco Maggini,et al.  An EM based training algorithm for cross-language text categorization , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[51]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[52]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[53]  Qiang Yang,et al.  Can chinese web pages be classified with english data source? , 2008, WWW.

[54]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[55]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[56]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[57]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[58]  Maosong Sun,et al.  Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[59]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[60]  Sabine Bergler,et al.  When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging , 2008, ACL.

[61]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.