Sentiment analysis of Chinese micro-blog using vector space model

In recent years, mining micro-blog becomes a hot research field, especially it may create commercial and political values in a fast changing big data era. This paper investigates the sentiment analysis of Chinese micro-blogs (SACM) using a vector space model. With the analysis of the nature properties of the Chinese micro-blogs, a sentiment analysis system has been proposed by formulating it as a two-type classification problem whether positive sentiment or negative sentiment. To achieve robust results, a preprocessing approach has been developed to remove the emotional unrelated words, transform the traditional expression to simplified one, and unify the punctuation by analyzing the dynamic and complicated micro-blog expressions. Besides, with aids of word segmentation and frequency statistical techniques the vector space model has been formed to generate the sentiment-related micro-blog feature vector. The support vector machine (SVM) has been taken as the classifier for its excellent ability in solving two-class classification problem. Experiments have been carried out to evaluate the proposed sentiment analysis system. Three different databases have been used in word segmentation stage including the emotion dictionary from Dalian University of Technology, CNKI-Hownet emotional dictionary and our self-established dictionary. Experimental results show that the proposed SACM system is able to achieve 80.86% classification accuracy using above databases.

[1]  Dai Lu,et al.  Sentiment Analysis in Chinese BBS , 2013 .

[2]  Soo-Min Kim,et al.  Crystal: Analyzing Predictive Opinions on the Web , 2007, EMNLP.

[3]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[4]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[5]  Stan Szpakowicz,et al.  Hierarchical versus Flat Classification of Emotions in Text , 2010, HLT-NAACL 2010.

[6]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[7]  D. Thalmann,et al.  Sentiment analysis of informal textual communication in cyberspace , 2010 .

[8]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[9]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[10]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[11]  Wenying Zheng,et al.  Sentiment Classification of Chinese Traveler Reviews by Support Vector Machine Algorithm , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[12]  Hsin-Hsi Chen,et al.  Mining opinions from the Web: Beyond relevance retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[13]  G. A. Mishne,et al.  Expiriments with mood classification in blog posts , 2005, SIGIR 2005.

[14]  Huaiyu Zhu On Information and Sufficiency , 1997 .