Sentiment Classification in Chinese Microblogs: Lexicon-based and Learning-based Approaches

Sentiment classification in Chinese microblogs is more challenging than that of Twitter for numerous reasons. In this paper, two kinds of approaches are proposed to classify opinionated Chinese- microblog posts: 1) lexicon-based approaches combining Simple Sentiment Word-Count Method with 3 Chinese sentiment lexicons, 2) machine learning models with multiple features. According to our experiment, lexicon-based approaches can yield relatively fine results and machine learning classifiers outperform both the majority baseline and lexicon-based approaches. Among all the machine learning-based approaches, Random Forests works best and the results are satisfactory.

[1]  John M. Swales,et al.  Helen Leckie-Tarry. Language and Context: A Functional Linguistic Theory of Register , 1996 .

[2]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[3]  David Birch,et al.  Language and Context: A Functional Linguistic Theory of Register , 1998 .

[4]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[5]  Shingo Kuroiwa,et al.  The Creation of a Chinese Emotion Ontology Based on HowNet , 2008, Eng. Lett..

[6]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[7]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[8]  George Papadakis,et al.  Content vs. context for sentiment analysis: a comparative analysis over microblogs , 2012, HT '12.

[9]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[10]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[11]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[12]  Alan F. Smeaton,et al.  Classifying sentiment in microblogs: is brevity an advantage? , 2010, CIKM.

[13]  Richard C. Gilman The General Inquirer: A Computer Approach to Content Analysis.Philip J. Stone , Dexter C. Dunphy , Marshall S. Smith , Daniel M. Ogilvie , 1968 .

[14]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[15]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[16]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .