NTUA-SLP at SemEval-2018 Task 3: Tracking Ironic Tweets using Ensembles of Word and Character Level Attentive RNNs

In this paper we present two deep-learning systems that competed at SemEval-2018 Task 3 "Irony detection in English tweets". We design and ensemble two independent models, based on recurrent neural networks (Bi-LSTM), which operate at the word and character level, in order to capture both the semantic and syntactic information in tweets. Our models are augmented with a self-attention mechanism, in order to identify the most informative words. The embedding layer of our word-level model is initialized with word2vec word embeddings, pretrained on a collection of 550 million English tweets. We did not utilize any handcrafted features, lexicons or external datasets as prior information and our models are trained end-to-end using back propagation on constrained data. Furthermore, we provide visualizations of tweets with annotations for the salient tokens of the attention layer that can help to interpret the inner workings of the proposed models. We ranked 2nd out of 42 teams in Subtask A and 2nd out of 31 teams in Subtask B. However, post-task-completion enhancements of our models achieve state-of-the-art results ranking 1st for both subtasks.

[1]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[2]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Nathalie Aussenac-Gilles,et al.  Towards a Contextual Pragmatic Model to Detect Irony in Tweets , 2015, ACL.

[5]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[6]  Pushpak Bhattacharyya,et al.  Are Word Embedding-based Features Useful for Sarcasm Detection? , 2016, EMNLP.

[7]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[8]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[9]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[10]  Paolo Rosso,et al.  Irony Detection in Twitter , 2016, ACM Trans. Internet Techn..

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Tony Veale,et al.  Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal , 2017, EMNLP.

[13]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[14]  Antal van den Bosch,et al.  The perfect solution for detecting sarcasm in tweets #not , 2013, WASSA@NAACL-HLT.

[15]  Nikos Pelekis,et al.  DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.

[16]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[17]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Ari Rappoport,et al.  ICWSM - A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews , 2010, ICWSM.

[20]  Pushpak Bhattacharyya,et al.  Automatic Sarcasm Detection , 2016, ACM Comput. Surv..

[21]  Tony Veale,et al.  Fracking Sarcasm using Neural Network , 2016, WASSA@NAACL-HLT.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Véronique Hoste,et al.  SemEval-2018 Task 3: Irony Detection in English Tweets , 2018, *SEMEVAL.

[25]  Benjamin J. Wilson,et al.  Controlled Experiments for Word Embeddings , 2015, ArXiv.

[26]  Horacio Saggion,et al.  Automatic Detection of Irony and Humour in Twitter , 2014, ICCC.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Antony J. Williams,et al.  Beautiful Data: The Stories Behind Elegant Data Solutions , 2009 .