Comparing the Utility of Different Classification Schemes for Emotive Language Analysis

In this paper we investigated the utility of different classification schemes for emotive language analysis with the aim of providing experimental justification for the choice of scheme for classifying emotions in free text. We compared six schemes: (1) Ekman's six basic emotions, (2) Plutchik's wheel of emotion, (3) Watson and Tellegen's Circumplex theory of affect, (4) the Emotion Annotation Representation Language (EARL), (5) WordNet–Affect, and (6) free text. To measure their utility, we investigated their ease of use by human annotators as well as the performance of supervised machine learning. We assembled a corpus of 500 emotionally charged text documents. The corpus was annotated manually using an online crowdsourcing platform with five independent annotators per document. Assuming that classification schemes with a better balance between completeness and complexity are easier to interpret and use, we expect such schemes to be associated with higher inter–annotator agreement. We used Krippendorff's alpha coefficient to measure inter–annotator agreement according to which the six classification schemes were ranked as follows: (1) six basic emotions (α = 0.483), (2) wheel of emotion (α = 0.410), (3) Circumplex (α = 0.312), EARL (α = 0.286), (5) free text (α = 0.205), and (6) WordNet–Affect (α = 0.202). However, correspondence analysis of annotations across the schemes highlighted that basic emotions are oversimplified representations of complex phenomena and as such likely to lead to invalid interpretations, which are not necessarily reflected by high inter-annotator agreement. To complement the result of the quantitative analysis, we used semi–structured interviews to gain a qualitative insight into how annotators interacted with and interpreted the chosen schemes. The size of the classification scheme was highlighted as a significant factor affecting annotation. In particular, the scheme of six basic emotions was perceived as having insufficient coverage of the emotion space forcing annotators to often resort to inferior alternatives, e.g. using happiness as a surrogate for love. On the opposite end of the spectrum, large schemes such as WordNet–Affect were linked to choice fatigue, which incurred significant cognitive effort in choosing the best annotation. In the second part of the study, we used the annotated corpus to create six training datasets, one for each scheme. The training data were used in cross–validation experiments to evaluate classification performance in relation to different schemes. According to the F-measure, the classification schemes were ranked as follows: (1) six basic emotions (F = 0.410), (2) Circumplex (F = 0.341), (3) wheel of emotion (F = 0.293), (4) EARL (F = 0.254), (5) free text (F = 0.159) and (6) WordNet–Affect (F = 0.158). Not surprisingly, the smallest scheme was ranked the highest in both criteria. Therefore, out of the six schemes studied here, six basic emotions are best suited for emotive language analysis. However, both quantitative and qualitative analysis highlighted its major shortcoming – oversimplification of positive emotions, which are all conflated into happiness. Further investigation is needed into ways of better balancing positive and negative emotions.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[3]  Marco Colombetti,et al.  Using WordNet to turn a Folksonomy into a Hierarchy of Concepts , 2007, SWAP.

[4]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[5]  Anthony C. Boucouvalas,et al.  Text-to-Emotion Engine for Real Time Internet Communication , 2002 .

[6]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[7]  Dipankar Das,et al.  Sentence-Level Emotion and Valence Tagging , 2012, Cognitive Computation.

[8]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[9]  J. Steenkamp,et al.  Emotions in consumer behavior: A hierarchical approach , 2005 .

[10]  Michael Gamon,et al.  Customizing Sentiment Classifiers to New Domains: a Case Study , 2019 .

[11]  Andreas Buerki,et al.  Idiom-Based Features in Sentiment Analysis: Cutting the Gordian Knot , 2020, IEEE Transactions on Affective Computing.

[12]  Mitsuru Ishizuka,et al.  Emotion Estimation and Reasoning Based on Affective Textual Interaction , 2005, ACII.

[13]  Carlo Strapparava,et al.  Developing Affective Lexical Resources , 2004, PsychNology J..

[14]  Stave Hendrix,et al.  ARE THEY DIFFERENT , 1967 .

[15]  A. Mehrabian Silent Messages: Implicit Communication of Emotions and Attitudes , 1971 .

[16]  Erkki Sutinen,et al.  Are They Different? Affect, Feeling, Emotion, Sentiment, and Opinion Detection in Text , 2014, IEEE Transactions on Affective Computing.

[17]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[18]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[19]  R. Thayer The Origin of Everyday Moods: Managing Energy, Tension, and Stress , 1996 .

[20]  P. Shaver,et al.  Emotion knowledge: further exploration of a prototype approach. , 1987, Journal of personality and social psychology.

[21]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[22]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[23]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[24]  Virginia Francisco,et al.  Automated Mark Up of Affective Information in English Texts , 2006, TSD.

[25]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[26]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[27]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[28]  J. Russell Affective space is bipolar. , 1979 .

[29]  Lawrence Hubert,et al.  The variance of the adjusted Rand index. , 2016, Psychological methods.

[30]  Stan Szpakowicz,et al.  Identifying Expressions of Emotion in Text , 2007, TSD.

[31]  C. Darwin The Expression of the Emotions in Man and Animals , .

[32]  Dimitar Kazakov,et al.  WordNet-based text document clustering , 2004 .

[33]  Vincent Larivière,et al.  Tweets as impact indicators: Examining the implications of automated “bot” accounts on Twitter , 2014, J. Assoc. Inf. Sci. Technol..

[34]  H. Hirschfeld A Connection between Correlation and Contingency , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[35]  Hendrik Blockeel,et al.  On estimating model accuracy with repeated cross-validation , 2012 .

[36]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[37]  Erik Cambria,et al.  The Hourglass of Emotions , 2011, COST 2102 Training School.

[38]  Rebecca J. Passonneau,et al.  Relation between Agreement Measures on Human Labeling and Machine Learning Performance: Results from an Art History Domain , 2008, LREC.

[39]  A. Damasio The feeling of what happens , 2001 .

[40]  Mitsuru Ishizuka,et al.  Recognition of Fine-Grained Emotions from Text: An Approach Based on the Compositionality Principle , 2010, Modeling Machine Emotions for Realizing Intelligence.

[41]  A. Damasio The Feeling of What Happens: Body and Emotion in the Making of Consciousness , 1999 .

[42]  Rebecca J. Passonneau,et al.  Relation between Agreement Measures on Human Labeling and Machine Learning Performance : Results from an Art History Image Indexing Domain , 2007 .

[43]  Mitsuru Ishizuka,et al.  Compositionality Principle in Recognition of Fine-Grained Emotions from Text , 2009, ICWSM.

[44]  Eric Tsui,et al.  TaxoFolk: A hybrid taxonomy-folksonomy structure for knowledge classification and navigation , 2011, Expert Syst. Appl..

[45]  Alun D. Preece,et al.  The role of idioms in sentiment analysis , 2015, Expert Syst. Appl..

[46]  Michel Généreux,et al.  Distinguishing Affective States in Weblog Posts , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[47]  D. Watson,et al.  Toward a consensual structure of mood. , 1985, Psychological bulletin.

[48]  G. Sullivan Wittgenstein and the grammar of pride: The relevance of philosophy to studies of self-evaluative emotions , 2007 .

[49]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[50]  J. Russell,et al.  Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. , 1999, Journal of personality and social psychology.

[51]  Ramón López-Cózar,et al.  Influence of contextual information in emotion annotation for spoken dialogue systems , 2008, Speech Commun..

[52]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[53]  L. Hubert,et al.  Comparing partitions , 1985 .

[54]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[55]  S. Paradiso The Emotional Brain: The Mysterious Underpinnings of Emotional Life , 1998 .

[56]  Hsia-Ching Chang,et al.  A new perspective on Twitter hashtag use: Diffusion of innovation theory , 2010, ASIST.

[57]  Anthony C. Boucouvalas,et al.  Real Time Text-to-Emotion Engine for Expressive Internet Communications , 2003 .

[58]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[59]  Joseph E LeDoux The Emotional Brain: The Mysterious Underpinnings of Emotional Life , 1996 .

[60]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[61]  D. Rubin,et al.  A comparison of dimensional models of emotion: Evidence from emotions, prototypical events, autobiographical memories, and words , 2009, Memory.

[62]  Christine Storm,et al.  A taxonomic study of the vocabulary of emotions. , 1987 .

[63]  G. A. Miller The magical number seven plus or minus two: some limits on our capacity for processing information. , 1956, Psychological review.

[64]  Elizabeth D. Liddy,et al.  Discerning Emotions in Texts , 2004, AAAI 2004.

[65]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[66]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[67]  Jean-Yves Antoine,et al.  Weighted Krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation , 2014, EACL.

[68]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[69]  Dan Jurafsky,et al.  Automatic Extraction of Opinion Propositions and their Holders , 2004 .

[70]  Michael S. Bernstein,et al.  Text Mining Emergent Human Behaviors for Interactive Systems , 2015, CHI Extended Abstracts.

[71]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[72]  Cecilia Ovesdotter Alm,et al.  Emotional Sequencing and Development in Fairy Tales , 2005, ACII.

[73]  Hugo Liu,et al.  A Corpus-based Approach to Finding Happiness , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[74]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[75]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[76]  Stuart Adam Battersby,et al.  Experimenting with Distant Supervision for Emotion Classification , 2012, EACL.

[77]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[78]  Anthony C. Boucouvalas,et al.  Representing Emotional Momentum within Expressive Internet Communication , 2006, EuroIMSA.

[79]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[80]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .