Multi-resolution Annotations for Emoji Prediction

Emojis are able to express various linguistic components, including emotions, sentiments, events, etc. Predicting the proper emojis associated with text provides a way to summarize the text accurately, and it has been proven to be a good auxiliary task to many Natural Language Understanding (NLU) tasks. Labels in existing emoji prediction datasets are all passage-based and are usually under the multi-class classification setting. However, in many cases, one single emoji cannot fully cover the theme of a piece of text. It is thus useful to infer the part of text related to each emoji. The lack of multi-label and aspect-level emoji prediction datasets is one of the bottlenecks for this task. This paper annotates an emoji prediction dataset with passage-level multi-class/multi-label, and aspect-level multi-class annotations. We also present a novel annotation method with which we generate the aspect-level annotations. The annotations are generated heuristically, taking advantage of the self-attention mechanism in Transformer networks. We validate the annotations both automatically and manually to ensure their quality. We also benchmark the dataset with a pre-trained BERT model.

[1]  Patrizia Paggio,et al.  Classifying the Informative Behaviour of Emoji in Microblogs , 2018, LREC.

[2]  A. Joshi,et al.  Likert Scale: Explored and Explained , 2015 .

[3]  Benno Stein,et al.  Celebrity Profiling , 2019, ACL.

[4]  Anna Rumshisky,et al.  Revealing the Dark Secrets of BERT , 2019, EMNLP.

[5]  Eduardo Blanco,et al.  Incorporating Emoji Descriptions Improves Tweet Classification , 2019, NAACL.

[6]  Samira Shaikh,et al.  Emoji Usage Across Platforms: A Case Study for the Charlottesville Event , 2019, WNLP@ACL.

[7]  Naoki Otani,et al.  What A Sunny Day â˜": Toward Emoji-Sensitive Irony Detection , 2019, W-NUT@EMNLP.

[8]  Soroush Vosoughi,et al.  Emoji Prediction: Extensions and Benchmarking , 2020, ArXiv.

[9]  Rada Mihalcea,et al.  MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations , 2018, ACL.

[10]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[11]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[12]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Alan W. Black,et al.  Dataset Analysis and Augmentation for Emoji-Sensitive Irony Detection , 2019, EMNLP 2019.

[16]  Steven Schockaert,et al.  Interpretable Emoji Prediction via Label-Wise Attention LSTMs , 2018, EMNLP.

[17]  Orion Montoya,et al.  Varying Linguistic Purposes of Emoji in (Twitter) Context , 2017, ACL.

[18]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[19]  Horacio Saggion,et al.  SemEval 2018 Task 2: Multilingual Emoji Prediction , 2018, *SEMEVAL.

[20]  Yonatan Belinkov,et al.  Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.

[21]  Horacio Saggion,et al.  Multimodal Emoji Prediction , 2018, NAACL.

[22]  Joel R. Tetreault,et al.  Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , 2018, NAACL.