AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking

With the continuing spread of misinformation and disinformation online, it is of increasing importance to develop combating mechanisms at scale in the form of automated systems that support multiple languages. One task of interest is claim veracity prediction, which can be addressed using stance detection with respect to relevant documents retrieved online. To this end, we present our new Arabic Stance Detection dataset (AraStance) of 4,063 claim–article pairs from a diverse set of sources comprising three fact-checking websites and one news website. AraStance covers false and true claims from multiple domains (e.g., politics, sports, health) and several Arab countries, and it is well-balanced between related and unrelated documents with respect to the claims. We benchmark AraStance, along with two other stance detection datasets, using a number of BERT-based models. Our best model achieves an accuracy of 85% and a macro F1 score of 78%, which leaves room for improvement and reflects the challenging nature of AraStance and the task of stance detection in general.

[1]  Deniz Yuret,et al.  KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media , 2020, SEMEVAL.

[2]  Andreas Vlachos,et al.  Fact Checking: Task definition and dataset construction , 2014, LTCSS@ACL.

[3]  Brian Ecker,et al.  Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it , 2016, LREC.

[4]  Mohammad Taher Pilehvar,et al.  Towards Automatic Fake News Detection: Cross-Level Stance Detection in News Articles , 2018, Proceedings of the First Workshop on Fact Extraction and VERification (FEVER).

[5]  Iryna Gurevych,et al.  A Retrospective Analysis of the Fake News Challenge Stance-Detection Task , 2018, COLING.

[6]  Muhammad Abdul-Mageed,et al.  Automatic Detection of Machine Generated Text: A Critical Survey , 2020, COLING.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Andreas Vlachos,et al.  The Fact Extraction and VERification (FEVER) Shared Task , 2018, FEVER@EMNLP.

[9]  Jude Khouja,et al.  Stance Prediction and Claim Verification: An Arabic Perspective , 2020, FEVER.

[10]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[11]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[12]  Preslav Nakov,et al.  Contrastive Language Adaptation for Cross-Lingual Stance Detection , 2019, EMNLP.

[13]  Qiang Zhang,et al.  From Stances' Imbalance to Their HierarchicalRepresentation and Detection , 2019, WWW.

[14]  Bilel Elayeb,et al.  ANT Corpus: An Arabic News Text Collection for Textual Classification , 2017, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA).

[15]  Melanie Siegel,et al.  Automatic Fake News Detection with Pre-trained Transformer Models , 2020, ICPR Workshops.

[16]  F. Can,et al.  Stance Detection , 2020, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[17]  B. Morton Fake news. , 2018, Marine pollution bulletin.

[18]  Iryna Gurevych,et al.  Stance Detection Benchmark: How Robust is Your Stance Detection? , 2020, KI - Künstliche Intelligenz.

[19]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[20]  Preslav Nakov,et al.  Integrating Stance Detection and Fact Checking in a Unified Corpus , 2018, NAACL.

[21]  Muhammad Abdul-Mageed,et al.  Machine Generation and Detection of Arabic Manipulated and Fake News , 2020, WANLP.

[22]  Andreas Vlachos,et al.  Emergent: a novel data-set for stance classification , 2016, NAACL.

[23]  Smaranda Muresan,et al.  Robust Document Retrieval and Individual Evidence Modeling for Fact Extraction and Verification. , 2018, FEVER@EMNLP.

[24]  Muhammad Abdul-Mageed,et al.  ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic , 2020, ACL.

[25]  Preslav Nakov,et al.  Automatic Stance Detection Using End-to-End Memory Networks , 2018, NAACL.

[26]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[27]  Yassine Benajiba,et al.  ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy , 2009, CICLing.