Sentiment analysis of Chinese documents: From sentence to document level

User-generated content on the Web has become an extremely valuable source for mining and analyzing user opinions on any topic. Recent years have seen an increasing body of work investigating methods to recognize favorable and unfavorable sentiments toward specific subjects from online text. However, most of these efforts focus on English and there have been very few studies on sentiment analysis of Chinese content. This paper aims to address the unique challenges posed by Chinese sentiment analysis. We propose a rule-based approach including two phases: (1) determining each sentence's sentiment based on word dependency, and (2) aggregating sentences to predict the document sentiment. We report the results of an experimental study comparing our approach with three machine learning-based approaches using two sets of Chinese articles. These results illustrate the effectiveness of our proposed method and its advantages against learning-based approaches. © 2009 Wiley Periodicals, Inc.

[1]  Wen Shi,et al.  Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[2]  Tiejun Zhao,et al.  Research on Query Translation Disambiguation for CLIR Based on HowNet , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[3]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[4]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[5]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[6]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[7]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8]  Qiang Dong,et al.  HowNet - a hybrid language and knowledge resource , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[9]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[10]  Claire Cardie,et al.  The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework , 2008, COLING.

[11]  Anu Sharma,et al.  Applying Decision Tree for Automatic Classification of Agricultural Web Documents , 2007, IICAI.

[12]  Oi Yee Kwong,et al.  Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words , 2004, COLING.

[13]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[14]  Tony Veale,et al.  Analogy as Functional Recategorization: Abstraction with HowNet Semantics , 2005, IJCNLP.

[15]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[16]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[17]  ChenHsinchun,et al.  Sentiment analysis in multiple languages , 2008 .

[18]  Yongqiang Li,et al.  A Cascaded Syntactic and Semantic Dependency Parsing System , 2008, CoNLL.

[19]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[20]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[21]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[22]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[23]  T. V. Prabhakar,et al.  Sentence Level Sentiment Analysis in the Presence of Conjuncts Using Linguistic Analysis , 2007, ECIR.

[24]  Shingo Kuroiwa,et al.  Semi-Automatic Construction of an Emotion Ontology Using HowNet , 2007, Artificial Intelligence and Pattern Recognition.

[25]  Xinying Xu,et al.  Hidden sentiment association in chinese web opinion mining , 2008, WWW.

[26]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[27]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[28]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[29]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[30]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[31]  Rob Malouf,et al.  A Preliminary Investigation into Sentiment Analysis of Informal Political Discourse , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[32]  Hiroshi Nakagawa,et al.  Applying cascaded feature selection to SVM text categorization , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[33]  Ophir Frieder,et al.  Repeatable evaluation of search services in dynamic environments , 2007, TOIS.

[34]  D TurneyPeter,et al.  Measuring praise and criticism , 2003 .

[35]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[36]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[37]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[38]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[39]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[40]  M. Laver,et al.  Extracting Policy Positions from Political Texts Using Words as Data , 2003, American Political Science Review.

[41]  Yi Mao,et al.  Isotonic Conditional Random Fields and Local Sentiment Flow , 2006, NIPS.

[42]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[43]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[44]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[45]  J. Ross Quinlan,et al.  Learning decision tree classifiers , 1996, CSUR.

[46]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[47]  Tom B. Y. Lai,et al.  Polarity Classification of Celebrity Coverage in the Chinese Press , 2005 .

[48]  Yong Wang,et al.  Classification of Web documents using a naive Bayes method , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[49]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[50]  Kuo Zhang,et al.  Keyword extraction based on tf/idf for Chinese news document , 2007, Wuhan University Journal of Natural Sciences.