Presenting TWITTIRÒ-UD: An Italian Twitter Treebank in Universal Dependencies

In this paper we describe the early stage application of the Universal Dependencies to an Italian corpus from social media developed for shared tasks related to irony and stance detection. The development of this novel resource (TWITTIRÒ-UD) serves a twofold goal: it enriches the scenario of treebanks for social media and for Italian, and it paves the way for a more reliable extraction of a larger variety of morphological and syntactic features to be used by sentiment analysis tools. On the one hand, social media texts are especially hard to parse and the limited amount of resources for training and testing NLP tools further damages the situation. On the other hand, we thought that adding the Universal Dependencies format to the fine-grained annotation for irony, that was previously applied on TWITTIRÒ, might meaningfully help in the investigation of possible relationships between syntax and semantics of the uses of figurative language, irony in particular.

[1]  Noah A. Smith,et al.  A Dependency Parser for Tweets , 2014, EMNLP.

[2]  Simonetta Montemagni,et al.  The Evalita 2014 Dependency Parsing task , 2014 .

[3]  Yue Zhang,et al.  Universal Dependencies Parsing for Colloquial Singaporean English , 2017, ACL.

[4]  Efstathios Stamatatos,et al.  Syntactic N-grams as machine learning features for natural language processing , 2014, Expert Syst. Appl..

[5]  Milan Straka,et al.  Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe , 2017, CoNLL.

[6]  Fei Song,et al.  Feature Selection for Sentiment Analysis Based on Content and Syntax Models , 2011, Decis. Support Syst..

[7]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[8]  Simonetta Montemagni,et al.  Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies , 2014, LREC.

[9]  Cristina Bosco,et al.  Annotating Italian Social Media Texts in Universal Dependencies , 2017, DepLing.

[10]  Brendan T. O'Connor,et al.  Twitter Universal Dependency Parsing for African-American and Mainstream American English , 2018, ACL.

[11]  Cristina Bosco,et al.  Overview of the EVALITA 2016 Part Of Speech on TWitter for ITAlian Task , 2016, CLiC-it/EVALITA.

[12]  Cristina Bosco,et al.  Application and Analysis of a Multi-layered Scheme for Irony on the Italian Twitter Corpus TWITTIRÒ , 2018, LREC.

[13]  Nathalie Aussenac-Gilles,et al.  Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study , 2017, EACL.

[14]  Yijia Liu,et al.  Parsing Tweets into Universal Dependencies , 2018, NAACL.

[15]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[16]  Cristina Bosco,et al.  PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies , 2018, LREC.

[17]  Paolo Rosso,et al.  Overview of the EVALITA 2018 Task on Irony Detection in Italian Tweets (IronITA) , 2018, EVALITA@CLiC-it.

[18]  Riyaz Ahmad Bhat,et al.  Universal Dependency Parsing for Hindi-English Code-Switching , 2018, NAACL.

[19]  Allan Ramsay,et al.  Universal Dependencies for Arabic Tweets , 2017, RANLP.

[20]  Josef van Genabith,et al.  #hardtoparse: POS Tagging and Parsing the Twitterverse , 2011, Analyzing Microtext.