CIEA: A Corpus for Chinese Implicit Emotion Analysis

The traditional cultural euphemism of the Han nationality has profound ideological roots. China has always advocated Confucianism, which has led to the implicit expression of Chinese people’s emotions. There are almost no obvious emotional words in spoken language, which poses a challenge to Chinese sentiment analysis. It is very interesting to exploit a corpus that does not contain emotional words, but instead uses detailed description in text to determine the category of the emotional expressed. In this study, we propose a corpus for Chinese implicit sentiment analysis. To do this, we have crawled millions of microblogs. After data cleaning and processing, we obtained the corpus. Based on this corpus, we introduced conventional models and neural networks for implicit sentiment analysis, and achieve promising results. A comparative experiment with a well-known corpus showed the importance of implicit emotions to emotional classification. This not only shows the usefulness of the proposed corpus for implicit sentiment analysis research, but also provides a baseline for further research on this topic.

[1]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[2]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[3]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[4]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[5]  Peter D. Turney Thumbs Up, Thumbs Down , 2013, Journal of Cell Science.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[10]  P. Ekman An argument for basic emotions , 1992 .

[11]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[12]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[13]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[14]  Saif Mohammad,et al.  IEST: WASSA-2018 Implicit Emotions Shared Task , 2018, WASSA@EMNLP.

[15]  Bruno Vellas,et al.  The assessment of frailty in older adults. , 2010, Clinics in geriatric medicine.

[16]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[17]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[18]  Saif Mohammad,et al.  Stance and Sentiment in Tweets , 2016, ACM Trans. Internet Techn..

[19]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[20]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[21]  Joel D. Martin,et al.  Sentiment, emotion, purpose, and style in electoral tweets , 2015, Inf. Process. Manag..

[22]  Jing Li,et al.  Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings , 2018, NAACL.