Innovative Semi-Automatic Methodology to Annotate Emotional Corpora

Detecting depression or personality traits, tutoring and student behaviour systems, or identifying cases of cyber-bulling are a few of the wide range of the applications, in which the automatic detection of emotion is a crucial element. Emotion detection has the potential of high impact by contributing the benefit of business, society, politics or education. Given this context, the main objective of our research is to contribute to the resolution of one of the most important challenges in textual emotion detection task: the problems of emotional corpora annotation. This will be tackled by proposing of a new semi-automatic methodology. Our innovative methodology consists in two main phases: (1) an automatic process to pre-annotate the unlabelled sentences with a reduced number of emotional categories; and (2) a refinement manual process where human annotators will determine which is the predominant emotion between the emotional categories selected in the phase 1. Our proposal in this paper is to show and evaluate the pre-annotation process to analyse the feasibility and the benefits by the methodology proposed. The results obtained are promising and allow obtaining a substantial improvement of annotation time and cost and confirm the usefulness of our pre-annotation process to improve the annotation task.

[1]  Diana Inkpen,et al.  Using a Heterogeneous Dataset for Emotion Analysis in Text , 2011, Canadian Conference on AI.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[4]  Saif Mohammad,et al.  Portable Features for Classifying Emotional Text , 2012, NAACL.

[5]  Mitsuru Ishizuka,et al.  Compositionality Principle in Recognition of Fine-Grained Emotions from Text , 2009, ICWSM.

[6]  Colin Cherry,et al.  Binary Classifiers and Latent Sequence Models for Emotion Detection in Suicide Notes , 2012, Biomedical informatics insights.

[7]  Jean-Yves Antoine,et al.  Weighted Krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation , 2014, EACL.

[8]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[9]  Elizabeth D. Liddy,et al.  EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis , 2016, LREC.

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[12]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[13]  Patricio Martínez-Barco,et al.  EmotiBlog: A Model to Learn Subjective Information Detection in the New Textual Genres of the Web 2.0 -a Multilingual and Multi-Genre Approach , 2012, Proces. del Leng. Natural.

[14]  Jarkko Suhonen,et al.  Emotion analysis meets learning analytics: online learner profiling beyond numerical data , 2014, Koli Calling.

[15]  Munmun De Choudhury,et al.  Happy, Nervous or Surprised? Classification of Human Affective States in Social Media , 2012, ICWSM.

[16]  Stan Szpakowicz,et al.  Identifying Expressions of Emotion in Text , 2007, TSD.

[17]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[18]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[19]  Fazel Keshtkar,et al.  A Corpus-based Method for Extracting Paraphrases of Emotion Terms , 2010, HLT-NAACL 2010.

[20]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Megha Agrawal,et al.  Characterizing Geographic Variation in Well-Being Using Tweets , 2013, ICWSM.

[23]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[24]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[25]  Christine A. Lindberg,et al.  Oxford American Writer's Thesaurus , 2012 .

[26]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[27]  P. Ekman An argument for basic emotions , 1992 .