Multilingual Stance Detection: The Catalonia Independence Corpus

Stance detection aims to determine the attitude of a given text with respect to a specific topic or claim. While stance detection has been fairly well researched in the last years, most the work has been focused on English. This is mainly due to the relative lack of annotated data in other languages. The TW-10 Referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish. Unfortunately, the TW-10 Catalan subset is extremely imbalanced. This paper addresses these issues by presenting a new multilingual dataset for stance detection in Twitter for the Catalan and Spanish languages, with the aim of facilitating research on stance detection in multilingual and cross-lingual settings. The dataset is annotated with stance towards one topic, namely, the independence of Catalonia. We also provide a semi-automatic method to annotate the dataset based on a categorization of Twitter users. We experiment on the new corpus with a number of supervised approaches, including linear classifiers and deep learning methods. Comparison of our new corpus with the with the TW-1O dataset shows both the benefits and potential of a well balanced corpus for multilingual and cross-lingual research on stance detection. Finally, we establish new state-of-the-art results on the TW-10 dataset, both for Catalan and Spanish.

[1]  Guodong Zhou,et al.  Stance Detection with Hierarchical Attention Network , 2018, COLING.

[2]  Roland Vollgraf,et al.  FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP , 2019, NAACL.

[3]  Paolo Rosso,et al.  Overview of the Task on Stance and Gender Detection in Tweets on Catalan Independence , 2017, IberEval@SEPLN.

[4]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[5]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[6]  Carlos Almendros Cuquerella,et al.  CriCa Team: MultiModal Stance Detection in Tweets on Catalan 1Oct Referendum (MultiStanceCat) , 2018, IberEval@SEPLN.

[7]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[8]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[9]  Huan Liu,et al.  Identifying Users with Opposing Opinions in Twitter Debates , 2014, SBP.

[10]  Isabel Segura-Bedmar LABDA's Early Steps Toward Multimodal Stance Detection , 2018, IberEval@SEPLN.

[11]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[12]  Ruifeng Xu,et al.  Stance Classification with Target-specific Neural Attention , 2017, IJCAI.

[13]  Paolo Rosso,et al.  Overview of the Task on Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum , 2018, IberEval@SEPLN.

[14]  Rune Sætre,et al.  IDI$@$NTNU at SemEval-2016 Task 6: Detecting Stance in Tweets Using Shallow Features and GloVe Vectors for Word Representation , 2016, SemEval@NAACL-HLT.

[15]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Ronald Rousseau,et al.  Social network analysis: a powerful strategy, also for the information sciences , 2002, J. Inf. Sci..

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Preslav Nakov,et al.  Contrastive Language Adaptation for Cross-Lingual Stance Detection , 2019, EMNLP.

[20]  Xiao Zhang,et al.  pkudblab at SemEval-2016 Task 6 : A Specific Convolutional Neural Network System for Effective Stance Detection , 2016, *SEMEVAL.

[21]  Saif Mohammad,et al.  Stance and Sentiment in Tweets , 2016, ACM Trans. Internet Techn..

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Preslav Nakov,et al.  Integrating Stance Detection and Fact Checking in a Unified Corpus , 2018, NAACL.

[24]  Kalina Bontcheva,et al.  Stance Detection with Bidirectional Conditional Encoding , 2016, EMNLP.

[25]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[26]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[27]  Guido Zarrella,et al.  MITRE at SemEval-2016 Task 6: Transfer Learning for Stance Detection , 2016, *SEMEVAL.

[28]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[29]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.