论文信息 - Multi-resolution Annotations for Emoji Prediction - 字舞流文

Multi-resolution Annotations for Emoji Prediction

Emojis are able to express various linguistic components, including emotions, sentiments, events, etc. Predicting the proper emojis associated with text provides a way to summarize the text accurately, and it has been proven to be a good auxiliary task to many Natural Language Understanding (NLU) tasks. Labels in existing emoji prediction datasets are all passage-based and are usually under the multi-class classification setting. However, in many cases, one single emoji cannot fully cover the theme of a piece of text. It is thus useful to infer the part of text related to each emoji. The lack of multi-label and aspect-level emoji prediction datasets is one of the bottlenecks for this task. This paper annotates an emoji prediction dataset with passage-level multi-class/multi-label, and aspect-level multi-class annotations. We also present a novel annotation method with which we generate the aspect-level annotations. The annotations are generated heuristically, taking advantage of the self-attention mechanism in Transformer networks. We validate the annotations both automatically and manually to ensure their quality. We also benchmark the dataset with a pre-trained BERT model.

Soroush Vosoughi | Weicheng Ma | Ruibo Liu | Lili Wang

[1] Patrizia Paggio,et al. Classifying the Informative Behaviour of Emoji in Microblogs , 2018, LREC.

[2] A. Joshi,et al. Likert Scale: Explored and Explained , 2015 .

[3] Benno Stein,et al. Celebrity Profiling , 2019, ACL.

[4] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.

[5] Eduardo Blanco,et al. Incorporating Emoji Descriptions Improves Tweet Classification , 2019, NAACL.

[6] Samira Shaikh,et al. Emoji Usage Across Platforms: A Case Study for the Charlottesville Event , 2019, WNLP@ACL.

[7] Naoki Otani,et al. What A Sunny Day â": Toward Emoji-Sensitive Irony Detection , 2019, W-NUT@EMNLP.

[8] Soroush Vosoughi,et al. Emoji Prediction: Extensions and Benchmarking , 2020, ArXiv.

[9] Rada Mihalcea,et al. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations , 2018, ACL.

[10] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[11] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[12] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[15] Alan W. Black,et al. Dataset Analysis and Augmentation for Emoji-Sensitive Irony Detection , 2019, EMNLP 2019.

[16] Steven Schockaert,et al. Interpretable Emoji Prediction via Label-Wise Attention LSTMs , 2018, EMNLP.

[17] Orion Montoya,et al. Varying Linguistic Purposes of Emoji in (Twitter) Context , 2017, ACL.

[18] Iyad Rahwan,et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[19] Horacio Saggion,et al. SemEval 2018 Task 2: Multilingual Emoji Prediction , 2018, *SEMEVAL.

[20] Yonatan Belinkov,et al. Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.

[21] Horacio Saggion,et al. Multimodal Emoji Prediction , 2018, NAACL.

[22] Joel R. Tetreault,et al. Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , 2018, NAACL.