A Comparison of Emotion Annotation Approaches for Text

While the recognition of positive/negative sentiment in text is an established task with many standard data sets and well developed methodologies, the recognition of a more nuanced affect has received less attention: there are few publicly available annotated resources and there are a number of competing emotion representation schemes with as yet no clear approach to choose between them. To address this lack, we present a series of emotion annotation studies on tweets, providing methods for comparisons between annotation methods (relative vs. absolute) and between different representation schemes. We find improved annotator agreement with a relative annotation scheme (comparisons) on a dimensional emotion model over a categorical annotation scheme on Ekman’s six basic emotions; however, when we compare inter-annotator agreement for comparisons with agreement for a rating scale annotation scheme (both with the same dimensional emotion model), we find improved inter-annotator agreement with rating scales, challenging a common belief that relative judgements are more reliable. To support these studies and as a contribution in itself, we further present a publicly available collection of 2019 tweets annotated with scores on each of four emotion dimensions: valence, arousal, dominance and surprise, following the emotion representation model identified by Fontaine et al. in 2007.

[1]  Udo Hahn,et al.  EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis , 2017, EACL.

[2]  Angeliki Metallinou,et al.  Annotation and processing of continuous emotional attributes: Challenges and opportunities , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[3]  Saif Mohammad,et al.  WASSA-2017 Shared Task on Emotion Intensity , 2017, WASSA@EMNLP.

[4]  Lyle H. Ungar,et al.  Modelling Valence and Arousal in Facebook posts , 2016, WASSA@NAACL-HLT.

[5]  Carlos Busso,et al.  The ordinal nature of emotions , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[6]  Saif Mohammad,et al.  SemEval-2018 Task 1: Affect in Tweets , 2018, *SEMEVAL.

[7]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[8]  Muhammad Abdul-Mageed,et al.  EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks , 2017, ACL.

[9]  Yi-Hsuan Yang,et al.  Ranking-Based Emotion Recognition for Music Organization and Retrieval , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Ian D. Wood,et al.  Emoji as Emotion Tags for Tweets , 2016 .

[12]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[13]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[14]  Rebecca J. Passonneau Computing Reliability for Coreference Annotation , 2004, LREC.

[15]  Paul Buitelaar,et al.  A Comparison Of Emotion Annotation Schemes And A New Annotated Data Set , 2018, LREC.

[16]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[17]  Lung-Hao Lee,et al.  Building Chinese Affective Resources in Valence-Arousal Dimensions , 2016, NAACL.

[18]  James O'Neill,et al.  NUIG at EmoInt-2017: BiLSTM and SVR Ensemble to Detect Emotion Intensity , 2017, WASSA@EMNLP.

[19]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[20]  Rebecca J. Passonneau,et al.  Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation , 2006, LREC.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Jordan J. Louviere,et al.  Best-Worst Scaling: Theory, Methods and Applications , 2015 .

[23]  Georgios N. Yannakakis,et al.  Don’t Classify Ratings of Affect; Rank Them! , 2014, IEEE Transactions on Affective Computing.

[24]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[25]  J. Russell,et al.  Evidence for a three-factor theory of emotions , 1977 .

[26]  Saif M. Mohammad,et al.  Sentiment Analysis: Detecting Valence, Emotions, and Other Affectual States from Text , 2016, ArXiv.

[27]  Saif Mohammad,et al.  Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation , 2017, ACL.

[28]  M. Cole Cross-cultural universals of affective meaning. , 1976 .

[29]  K. Scherer,et al.  The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[30]  P. Ekman,et al.  DIFFERENCES Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion , 2004 .