Multilingual Irony Detection with Dependency Syntax and Neural Models

This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.

[1]  Efstathios Stamatatos,et al.  Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification , 2013, CICLing.

[2]  Lizhen Liu,et al.  Exploiting Syntactic Structures for Humor Recognition , 2018, COLING.

[3]  Paolo Rosso,et al.  Presenting TWITTIRÒ-UD: An Italian Twitter Treebank in Universal Dependencies , 2019, Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019).

[4]  D. Sperber,et al.  Irony and the Use-Mention Distinction , 1981 .

[5]  Paolo Rosso,et al.  Irony Detection in a Multilingual Context , 2020, ECIR.

[6]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[7]  Hiroshi Kanayama,et al.  How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment Detection , 2020, LREC.

[8]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[9]  Peijun Du,et al.  Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging , 2016, Neurocomputing.

[10]  Akira Utsumi,et al.  A Unified Theory of Irony and Its Computational Formalization , 1996, COLING.

[11]  Benjamin Müller,et al.  Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell , 2020, ACL.

[12]  Ildar Z. Batyrshin,et al.  Complete Syntactic N-grams as Style Markers for Authorship Attribution , 2014, MICAI.

[13]  Simonetta Montemagni,et al.  Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies , 2014, LREC.

[14]  Efstathios Stamatatos,et al.  Syntactic N-grams as machine learning features for natural language processing , 2014, Expert Syst. Appl..

[15]  Micha Elsner,et al.  Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems , 2017 .

[16]  Yijia Liu,et al.  Parsing Tweets into Universal Dependencies , 2018, NAACL.

[17]  Josef Ruppenhofer,et al.  Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies , 2020, LREC.

[18]  Samuel R. Bowman,et al.  A Gold Standard Dependency Corpus for English , 2014, LREC.

[19]  Grigori Sidorov,et al.  Should Syntactic N-grams Contain Names of Syntactic Relations? , 2014, Int. J. Comput. Linguistics Appl..

[20]  Malvina Nissim,et al.  Overview of the Evalita 2014 SENTIment POLarity Classification Task , 2014 .

[21]  Paolo Rosso,et al.  Overview of the Task on Irony Detection in Spanish Variants , 2019, IberLEF@SEPLN.

[22]  Josef Ruppenhofer,et al.  tweeDe – A Universal Dependencies treebank for German tweets , 2019 .

[23]  Cyril Grouin,et al.  Analyse d'opinion et langage figuratif dans des tweets : présentation et résultats du Défi Fouille de Textes DEFT2017 , 2017 .

[24]  Sampo Pyysalo,et al.  Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , 2020, LREC.

[25]  José-Ángel González,et al.  ELiRF-UPV at IroSvA: Transformer Encoders for Spanish Irony Detection , 2019, IberLEF@SEPLN.

[26]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[27]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[28]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[29]  Malvina Nissim,et al.  Overview of the Evalita 2016 SENTIment POLarity Classification Task , 2014, CLiC-it/EVALITA.

[30]  Laura A. Michaelis,et al.  What is this, sarcastic syntax? , 2015 .

[31]  Felice Dell'Orletta,et al.  Multi-task Learning in Deep Neural Networks at EVALITA 2018 , 2018, EVALITA@CLiC-it.

[32]  Paolo Rosso,et al.  SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter , 2015, *SEMEVAL.

[33]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[34]  Efstathios Stamatatos,et al.  Syntactic Dependency-Based N-grams as Classification Features , 2012, MICAI.

[35]  Paolo Rosso,et al.  UPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification , 2019, *SEMEVAL.

[36]  Nathalie Aussenac-Gilles,et al.  Towards a Contextual Pragmatic Model to Detect Irony in Tweets , 2015, ACL.

[37]  Amir Zeldes,et al.  A Deeper Look into Dependency-Based Word Embeddings , 2018, NAACL-HLT.

[38]  Pushpak Bhattacharyya,et al.  Harnessing Context Incongruity for Sarcasm Detection , 2015, ACL.

[39]  Cristina Bosco,et al.  PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies , 2018, LREC.

[40]  Véronique Hoste,et al.  SemEval-2018 Task 3: Irony Detection in English Tweets , 2018, *SEMEVAL.

[41]  Paolo Rosso,et al.  Overview of the EVALITA 2018 Task on Irony Detection in Italian Tweets (IronITA) , 2018, EVALITA@CLiC-it.

[42]  Alessandra Teresa Cignarella,et al.  ATC at IroSvA 2019: Shallow Syntactic Dependency-based Features for Irony Detection in Spanish Variants , 2019, IberLEF@SEPLN.

[43]  Chuhan Wu,et al.  THU_NGN at SemEval-2018 Task 3: Tweet Irony Detection with Densely connected LSTM and Multi-task Learning , 2018, *SEMEVAL.

[44]  Siobhan Chapman Logic and Conversation , 2005 .