From Emoji Usage to Categorical Emoji Prediction

Emoji usage drastically increased recently, they are becoming some of the most common ways to convey emotions and sentiments in social messaging applications. Several research works automatically recommend emojis, so users do not have to go through a library of thousands of emojis. In order to improve emoji recommendation, we present and distribute two useful resources: an emoji embedding model from real usage, and emoji clustering based on these embeddings to automatically identify groups of emojis. Assuming that emojis are part of written natural language and can be considered as words, we only used unsu-pervised learning methods to extract patterns and knowledge from real emoji usage in tweets. Thereby, emotion categories of face emojis were obtained directly from text in a fully reproductible way. These resources and methodology have multiple usages; for example, they could be used to improve our understanding of emojis or enhance emoji recommendation .

[1]  Caroline Kelly Do you know what I mean g :( : A linguistic study of the understanding ofemoticons and emojis in text messages , 2015 .

[2]  Isabelle Augenstein,et al.  emoji2vec: Learning Emoji Representations from their Description , 2016, SocialNLP@EMNLP.

[3]  Horacio Saggion,et al.  What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis , 2016, LREC.

[4]  Jacob Eisenstein,et al.  Emoticons vs. Emojis on Twitter: A Causal Inference Approach , 2015, ArXiv.

[5]  Zhiyuan Liu,et al.  Neural Emoji Recommendation in Dialogue Systems , 2016, ArXiv.

[6]  Amit P. Sheth,et al.  EmojiNet: An Open Service and API for Emoji Sense Discovery , 2017, ICWSM.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Ning Wang,et al.  Untangling Emoji Popularity Through Semantic Embeddings , 2017, ICWSM.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  HENNING POHL,et al.  Beyond Just Text , 2017, ACM Trans. Comput. Hum. Interact..

[13]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[14]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[15]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[16]  Ryan Kelly,et al.  Characterising the inventive appropriation of emoji as relationally meaningful in mediated close personal relationships , 2015 .

[17]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.