DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT

The objective of this work was the introduction of an effective approach based on the AraBERT language model for fighting Tweets COVID-19 Infodemic. It was arranged in the form of a two-step pipeline, where the first step involved a series of pre-processing procedures to transform Twitter jargon, including emojis and emoticons, into plain text, and the second step exploited a version of AraBERT, which was pre-trained on plain text, to fine-tune and classify the tweets with respect to their Label. The use of language models pre-trained on plain texts rather than on tweets was motivated by the necessity to address two critical issues shown by the scientific literature, namely (1) pre-trained language models are widely available in many languages, avoiding the time-consuming and resource-intensive model training directly on tweets from scratch, allowing to focus only on their fine-tuning; (2) available plain text corpora are larger than tweet-only ones, allowing for better performance.

[1]  Paul Rodrigues,et al.  Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of Claims using Transformer-based Models , 2020, CLEF.

[2]  Francesco Marcelloni,et al.  A survey on fake news and rumour detection techniques , 2019, Inf. Sci..

[3]  G. Caldarelli,et al.  The spreading of misinformation online , 2016, Proceedings of the National Academy of Sciences.

[4]  Hazem Hajj,et al.  AraBERT: Transformer-based Model for Arabic Language Understanding , 2020, OSACT.

[5]  Stavros T. Ponis,et al.  Twitter as an instrument for crisis response: The Typhoon Haiyan case study , 2015, ISCRAM.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Hueiseok Lim,et al.  exBAKE: Automatic Fake News Detection Model Based on Bidirectional Encoder Representations from Transformers (BERT) , 2019, Applied Sciences.

[8]  Chirag Shah,et al.  Towards automatic fake news classification , 2018 .

[9]  Issa Traoré,et al.  Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques , 2017, ISDDC.

[10]  Sungyong Seo,et al.  CSI: A Hybrid Deep Model for Fake News Detection , 2017, CIKM.

[11]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[12]  Firoj Alam,et al.  Findings of the NLP4IF-2021 Shared Tasks on Fighting the COVID-19 Infodemic and Censorship Detection , 2021, NLP4IF.

[13]  LiakataMaria,et al.  Detection and Resolution of Rumours in Social Media , 2018 .

[14]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media , 2018, Big Data.

[15]  Nada Ghneim,et al.  DamascusTeam at CheckThat! 2020: Check Worthiness on Twitter with Hybrid CNN and RNN Models , 2020, CLEF.

[16]  Preslav Nakov,et al.  Overview of CheckThat! 2020i Arabic: Automatic Identification and Verification of Claims in Social Media , 2020, CLEF.

[17]  Nikos Pelekis,et al.  DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.

[18]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[19]  Kevin Driscoll,et al.  The diffusion of misinformation on social media: Temporal pattern, message, and source , 2018, Comput. Hum. Behav..

[20]  Neil Shah,et al.  False Information on Web and Social Media: A Survey , 2018, ArXiv.

[21]  Huan Liu,et al.  dEFEND: Explainable Fake News Detection , 2019, KDD.

[22]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.