Incorporating Emoji Descriptions Improves Tweet Classification

Tweets are short messages that often include specialized language such as hashtags and emojis. In this paper, we present a simple strategy to process emojis: replace them with their natural language description and use pretrained word embeddings as normally done with standard words. We show that this strategy is more effective than using pretrained emoji embeddings for tweet classification. Specifically, we obtain new state-of-the-art results in irony detection and sentiment analysis despite our neural network is simpler than previous proposals.

[1]  Isabelle Augenstein,et al.  emoji2vec: Learning Emoji Representations from their Description , 2016, SocialNLP@EMNLP.

[2]  Véronique Hoste,et al.  SemEval-2018 Task 3: Irony Detection in English Tweets , 2018, *SEMEVAL.

[3]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Horacio Saggion,et al.  What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis , 2016, LREC.

[6]  Nikos Pelekis,et al.  DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.

[7]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[8]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[9]  Chuhan Wu,et al.  THU_NGN at SemEval-2018 Task 3: Tweet Irony Detection with Densely connected LSTM and Multi-task Learning , 2018, *SEMEVAL.

[10]  Georgios Paraskevopoulos,et al.  NTUA-SLP at SemEval-2018 Task 3: Tracking Ironic Tweets using Ensembles of Word and Character Level Attentive RNNs , 2018, *SEMEVAL.

[11]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[12]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[13]  Mickael Rouvier LIA at SemEval-2017 Task 4: An Ensemble of Neural Networks for Sentiment Classification , 2017, SemEval@ACL.

[14]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[15]  Minyi Guo,et al.  Emoticon Smoothed Language Models for Twitter Sentiment Analysis , 2012, AAAI.

[16]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[17]  Mathieu Cliche,et al.  BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs , 2017, *SEMEVAL.

[18]  Xiaomo Liu,et al.  Data Sets: Word Embeddings Learned from Tweets and General Data , 2017, ICWSM.

[19]  Diana Maynard,et al.  Who cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis. , 2014, LREC.