EmojiNet: An Open Service and API for Emoji Sense Discovery

This paper presents the release of EmojiNet, the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet is a dataset consisting of: (i) 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet, (ii) context words associated with each emoji sense, which are inferred through word embedding models trained over Google News corpus and a Twitter message corpus for each emoji sense definition, and (iii) recognizing discrepancies in the presentation of emoji on different platforms, specification of the most likely platform-based emoji sense for a selected set of emoji. The dataset is hosted as an open service with a REST API and is available at this http URL The development of this dataset, evaluation of its quality, and its applications including emoji sense disambiguation and emoji sense similarity are discussed.

[1]  Loren G. Terveen,et al.  "Blissfully Happy" or "Ready toFight": Varying Interpretations of Emoji , 2016, ICWSM.

[2]  Jacob Eisenstein,et al.  More emojis, less : ) The competition for paralinguistic function in microblog writing , 2016, First Monday.

[3]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[4]  Amit P. Sheth,et al.  EmojiNet: Building a Machine Readable Sense Inventory for Emoji , 2016, SocInfo.

[5]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[6]  Amit P. Sheth,et al.  Word Embeddings to Enhance Twitter Gang Member Profile Identification , 2016, ArXiv.

[7]  Petra Kralj Novak,et al.  Sentiment of Emojis , 2015, PloS one.

[8]  David R. Flatla,et al.  Oh that's what you meant!: reducing emoji misunderstanding , 2016, MobileHCI Adjunct.

[9]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[10]  Amit P. Sheth,et al.  Finding street gang members on Twitter , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[11]  Ryan Kelly,et al.  Characterising the inventive appropriation of emoji as relationally meaningful in mediated close personal relationships , 2015 .

[12]  Henriette Cramer,et al.  Sender-intended functions of emojis in US messaging , 2016, MobileHCI.

[13]  Horacio Saggion,et al.  How Cosmopolitan Are Emojis?: Exploring Emojis Usage and Meaning over Different Languages with Distributional Semantics , 2016, ACM Multimedia.

[14]  Isabelle Augenstein,et al.  emoji2vec: Learning Emoji Representations from their Description , 2016, SocialNLP@EMNLP.

[15]  Rebecca J. Passonneau,et al.  Annotating the MASC Corpus with BabelNet , 2014, LREC.

[16]  Subbarao Kambhampati,et al.  Dude, srsly?: The Surprisingly Formal Nature of Twitter's Language , 2013, ICWSM.

[17]  Wael Hassan Gomaa,et al.  A Survey of Text Similarity Approaches , 2013 .

[18]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[19]  Horacio Saggion,et al.  What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis , 2016, LREC.

[20]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21]  Annalina Caputo,et al.  An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model , 2014, COLING.

[22]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[23]  Philippe Langlais,et al.  Evaluating Variants of the Lesk Approach for Disambiguating Words , 2004, LREC.

[24]  M. Barber,et al.  Detecting network communities by propagating labels under constraints. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.