A Novel Machine Learning-based Sentiment Analysis Method for Chinese Social Media Considering Chinese Slang Lexicon and Emoticons

Internet slang is an informal language used in everyday online communication which quickly becomes adopted or discarded by new generations. Similarly, pictograms (emoticons/emojis) have been widely used in social media as a mean for graphical expression of emotions. People can convey delicate nuances through textual information when supported with emoticons. Furthermore, we also noticed that when people use new words and pictograms, they tend to express a kind of humorous emotion which is difficult to clearly classify as positive or negative. Therefore, it is important to fully understand the influence of Internet slang and emoticons on social media. In this paper, we propose a machine learning method considering Internet slang and emoticons for sentiment analysis of Weibo, the most popular Chinese social media platform. In the first step, we collected 448 frequent Internet slang expressions as a slang lexicon, then we converted the 109 Weibo emoticons into textual features creating Chinese emoticon lexicon. To test the capability of recognizing humorous posts, we utilized both lexicons with several machine learning approaches, k-Nearest Neighbors, Decision Tree, Random Forest, Logistic Regression, Naı̈ve Bayes and Support Vector Machine for detecting humorous expressions on Chinese social media. Our experimental results show that the proposed method can significantly improve the performance for detecting expressions which are difficult to polarize into positivenegative categories.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Patrice Bellot,et al.  From Emojis to Sentiment Analysis , 2016 .

[3]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[4]  Erik Cambria,et al.  A Review of Sentiment Analysis Research in Chinese Language , 2017, Cognitive Computation.

[5]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[6]  Huan Liu,et al.  SlangSD: Building and Using a Sentiment Dictionary of Slang Words for Short-Text Sentiment Classification , 2016, ArXiv.

[7]  William Yang Wang,et al.  TFW, DamnGina, Juvie, and Hotsie-Totsie: On the Linguistic and Social Aspects of Internet Slang , 2017, ArXiv.

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[10]  Heyan Huang,et al.  A Method of Polarity Computation of Chinese Sentiment Words Based on Gaussian Distribution , 2014, CICLing.

[11]  Ilaria Moschini The "Face with Tears of Joy" Emoji. A Socio-Semiotic and Multimodal Insight into a Japan-America Mash-Up , 2016 .

[12]  Xing Wu,et al.  Chinese text sentiment analysis based on fuzzy semantic model , 2014, 2014 IEEE 13th International Conference on Cognitive Informatics and Cognitive Computing.

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[15]  Kenji Araki,et al.  Emoticon-Aware Recurrent Neural Network Model for Chinese Sentiment Analysis , 2018, 2018 9th International Conference on Awareness Science and Technology (iCAST).

[16]  Daniel Dajun Zeng,et al.  Sentiment analysis of Chinese documents: From sentence to document level , 2009, J. Assoc. Inf. Sci. Technol..

[17]  Li Sun,et al.  A Depression Detection Model Based on Sentiment Analysis in Micro-blog Social Network , 2013, PAKDD Workshops.

[18]  John Carroll,et al.  Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Text , 2008, COLING.

[19]  Petra Kralj Novak,et al.  Sentiment of Emojis , 2015, PloS one.

[20]  Priyanka Sharma,et al.  Classification in Pattern Recognition: A Review , 2013 .

[21]  Jiebo Luo,et al.  Analyzing and Predicting Emoji Usages in Social Media , 2018, WWW.

[22]  Qin Lu,et al.  Combining Convolutional Neural Networks and Word Sentiment Sequence Features for Chinese Text Sentiment Classification , 2015 .

[23]  Chih-Jen Lin,et al.  Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.

[24]  Roberto I. González-Ibáñez,et al.  An Integrated Review of Emoticons in Computer-Mediated Communication , 2017, Frontiers in psychology.