A Lexicon-Based Multi-class Semantic Orientation Analysis for Microblogs

In the literature, most of existing works of semantic orientation analysis focus on the distinguishment of two polarities (positive and negative). In this paper, we propose a lexicon-based multi-class semantic orientation analysis for microblogs. To better capture the social attention on public events, we introduce Concern into the conventional psychological classes of sentiments and build up a sentiment lexicon with five categories(Concern, Joy, Blue, Anger, Fear). The seed words of the lexicon are extracted from HowNet, NTUSD, and catchwords of the Sina Weibo posts. The semantic similarity in HowNet is adopted to detect more sentiment words to enrich the lexicon. Accordingly, each Weibo post is represented as a multi-dimensional numerical vector in feature space. Then we adopt the Semi-Supervised Gaussian Mixture Model (Semi-GMM) and an adaptive K-nearst neighbour (KNN) with symmetric Kullback-Leibler divergence (KL-divergence) as similarity measurements to classify the posts. We compare our proposed methodologies with a few competitive baseline methods e.g., majority vote, KNN by using Cosine similarity, and SVM. The experimental evaluation shows that our proposed methods outperform other approaches by a large margin in terms of the accuracy and F1 score.

[1]  W. G. Parrott,et al.  Emotions in social psychology : essential readings , 2001 .

[2]  Ulf Blanke,et al.  Combining crowd-generated media and personal data: semi-supervised learning for context recognition , 2013, PDM '13.

[3]  Li-zhu Zhou Survey on research of sentiment analysis: Survey on research of sentiment analysis , 2009 .

[4]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[5]  Iñaki Inza,et al.  Approaching Sentiment Analysis by using semi-supervised learning of multi-dimensional classifiers , 2012, Neurocomputing.

[6]  Hu Yunfa,et al.  Using Maximum Entropy Model for Chinese Text Categorization , 2005 .

[7]  Wu Li-de,et al.  Semantic Orientation Computing Based on HowNet , 2006 .

[8]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[9]  Randolph R. Cornelius,et al.  The science of emotion: Research and tradition in the psychology of emotion. , 1997 .

[10]  Yulan He,et al.  A Bayesian modeling approach to multi-dimensional sentiment distributions prediction , 2012, WISDOM '12.

[11]  Ke Xu,et al.  MoodLens: an emoticon-based sentiment analysis system for chinese tweets , 2012, KDD.

[12]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[13]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[14]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[15]  Wang Jian-yong Survey on research of sentiment analysis , 2008 .

[16]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[17]  Don H. Johnson,et al.  Symmetrizing the Kullback-Leibler Distance , 2001 .