Chinese Micro-Blog Emotion Classification by Exploiting Linguistic Features and SVMperf

These years, micro-blog emotion mining becomes one of the research hotspots in social network data mining. Different from state of the art study, this paper presents a novel method for emotion classification , which is SVM \(^{\textit{perf}}\) based method combined with syntactic structure of Chinese micro-blogs. The classified emotion type includes Happiness, Anger, Disgust, Fear, Sadness and Surprise. For the proposed method, an emotional lexicon is constructed and linguistic features are extracted from micro-blog corpus firstly. Secondly, for the current feature space dimension is higher, Chi-square test is used to extract the high-frequency and high-class relevance keywords. At the same time, Pointwise Mutual Information (PMI) is used to pick the effective low frequency words in feature dimension reduction, which can reduce the computational complexity. Finally, SVM\(^{\textit{perf}}\) is applied for the emotion classification. In order to illustrate the effectiveness of the algorithm, LIBSVM and SVM-Light are used as the baseline. The data from Sina Micro-blog (weibo.com) have been used as the experiment data. The experiment results demonstrate that all the above features contribute to emotion classification in micro-blogs, and the results validate the feasibility of the proposed approach. It also shows that SVM \(^{\textit{perf}}\) is an appropriate choice of classifier for emotion classification.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Jiun-Hung Chen,et al.  A multi-label classification based approach for sentiment classification , 2015, Expert Syst. Appl..

[3]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[4]  Aijun An,et al.  Unsupervised Emotion Detection from Text Using Semantic and Syntactic Relations , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[5]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine Learning.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[8]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[9]  Diana Inkpen,et al.  Prior and contextual emotion of words in sentential context , 2014, Comput. Speech Lang..

[10]  Jon Atle Gulla,et al.  Sentiment analysis in a hybrid hierarchical classification process , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[11]  Tru H. Cao,et al.  A High-Order Hidden Markov Model for Emotion Detection from Textual Data , 2012, PKAW.

[12]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[13]  Pu Zhang,et al.  A weakly supervised approach to Chinese sentiment classification using partitioned self-training , 2013, J. Inf. Sci..

[14]  Matthew Purver,et al.  Predicting Emotion Labels for Chinese Microblog Texts , 2012, SDAD@ECML/PKDD.

[15]  Guang Yu,et al.  A method of feature selection and sentiment similarity for Chinese micro-blogs , 2013, J. Inf. Sci..

[16]  Wei Peng,et al.  Sentiment and topic analysis on social media: a multi-task multi-label classification approach , 2013, WebSci.

[17]  Aiping Li,et al.  Microblog Sentiment Analysis Model Based on Emoticons , 2014, APWeb Workshophs.

[18]  Véronique Hoste,et al.  Emotion detection in suicide notes , 2013, Expert Syst. Appl..

[19]  Bin Zhou,et al.  Analysis on Chinese Microblog Sentiment Based on Syntax Parsing and Support Vector Machine , 2014, APWeb Workshophs.

[20]  Chen Fu,et al.  A Study on Sentiment Computing and Classification of Sina Weibo with Word2vec , 2014, 2014 IEEE International Congress on Big Data.

[21]  Minho Kim,et al.  Lyrics-Based Emotion Classification Using Feature Selection by Partial Syntactic Analysis , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[22]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[23]  Xiaojun Wan,et al.  Emotion Classification in Microblog Texts Using Class Sequential Rules , 2014, AAAI.

[24]  Yiqun Liu,et al.  Every Term Has Sentiment: Learning from Emoticon Evidences for Chinese Microblog Sentiment Analysis , 2013, NLPCC.

[25]  Xin Li,et al.  A Lexicon-Based Multi-class Semantic Orientation Analysis for Microblogs , 2014, APWeb.

[26]  Qunwei Xue,et al.  An unsupervised approach for sentiment classification , 2012, 2012 IEEE Symposium on Robotics and Applications (ISRA).

[27]  Heyan Huang,et al.  Emotional Tendency Identification for Micro-blog Topics Based on Multiple Characteristics , 2012, PACLIC.

[28]  Kai Gao,et al.  Applied Methods and Techniques for Modeling and Control on Micro-Blog Data Crawler , 2014 .

[29]  Huang He,et al.  Sentiment analysis of Sina Weibo based on semantic sentiment space model , 2013, 2013 International Conference on Management Science and Engineering 20th Annual Conference Proceedings.

[30]  Hua Xu,et al.  Text-based emotion classification using emotion cause extraction , 2014, Expert Syst. Appl..