Sentiment detection and visualization of Chinese micro-blog

Micro-blog has been increasingly used for the public to express their opinions, and for organisations to detect public sentiment about social events. In contrast to the effort and progress made in English-based micro-blog analysis, research on Chinese micro-blog received relatively little attention. In this paper we examine and identify the key problems of this field, focusing particularly on the characteristics of innovative words, emoticon elements and hierarchical structure of Chinese “Weibo”. Based on the analysis we propose and develop associated theoretical and technological methods to address these problems. These include the development of new sentiment word mining method based on three wording standards and point-wise metrics, a rule set model for analyzing sentiment features of different linguistic components, and the corresponding methodology for calculating sentiment on multi-granularity considering emoticon elements. We use original Chinese tweets from a dataset of Sina Weibo to test and evaluate our new word discovery and sentiment detection methods. Initial results show that our new diction can improve sentiment detection, and demonstrate that our multi-level rule set method is more effective by giving 10.2% and 1.5% higher average accuracy than two existing methods for Chinese micro-blog sentiment analysis. In addition, we exploit visualisation techniques to study the relationships between online sentiment and real life, which can help depict the correlation between public emotions and events.

[1]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[2]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[3]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[4]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[5]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[6]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[7]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[8]  Yiqun Liu,et al.  Lexicon-Based Sentiment Analysis on Topical Chinese Microblog Messages , 2012, CSWS.

[9]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[10]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[11]  Wanxiang Che,et al.  Appraisal Expression Recognition with Syntactic Path for Sentence Sentiment Classification , 2011, Int. J. Comput. Process. Orient. Lang..

[12]  Ke Xu,et al.  MoodLens: an emoticon-based sentiment analysis system for chinese tweets , 2012, KDD.

[13]  Long Jiang,et al.  User-level sentiment analysis incorporating social networks , 2011, KDD.

[14]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[15]  Yi Su,et al.  The Chinese Bag-of-Opinions Method for Hot-Topic-Oriented Sentiment Analysis on Weibo , 2012, CSWS.

[16]  Hsin-Hsi Chen,et al.  Mining opinions from the Web: Beyond relevance retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[17]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[18]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[19]  Xinying Xu,et al.  Hidden sentiment association in chinese web opinion mining , 2008, WWW.