DeepReading @ SardiStance 2020: Combining Textual, Social and Emotional Features

In this paper we describe our participation to the SardiStance shared task held at EVALITA 2020. We developed a set of classifiers that combined text features, such as the best performing systems based on large pre-trained language models, together with user profile features, such as psychological traits and social media user interactions. The classification algorithms chosen for our models were various monolingual and multilingual Transformer models for text only classification, and XGBoost for the non-textual features. The combination of the textual and contextual models was performed by a weighted voting ensemble learning system. Our approach obtained the best score for Task B, on Contextual Stance Detection.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Paolo Rosso,et al.  SardiStance @ EVALITA2020: Overview of the Task on Stance Detection in Italian Tweets , 2020, EVALITA.

[3]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[4]  Danilo Croce,et al.  EVALITA 2020: Overview of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian , 2020, EVALITA.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[7]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[8]  E. Ambrosini,et al.  The adaptation of the Affective Norms for English Words (ANEW) for Italian , 2014, Behavior research methods.

[9]  Giovanni Semeraro,et al.  AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets , 2019, CLiC-it.

[10]  J. Russell A circumplex model of affect. , 1980 .

[11]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[12]  Benoît Sagot,et al.  Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures , 2019 .

[13]  Eneko Agirre,et al.  Give your Text Representation Models some Love: the Case for Basque , 2020, LREC.

[14]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[15]  Dan Roth,et al.  Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.