论文信息 - DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT - 字舞流文

DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT

The objective of this work was the introduction of an effective approach based on the AraBERT language model for fighting Tweets COVID-19 Infodemic. It was arranged in the form of a two-step pipeline, where the first step involved a series of pre-processing procedures to transform Twitter jargon, including emojis and emoticons, into plain text, and the second step exploited a version of AraBERT, which was pre-trained on plain text, to fine-tune and classify the tweets with respect to their Label. The use of language models pre-trained on plain texts rather than on tweets was motivated by the necessity to address two critical issues shown by the scientific literature, namely (1) pre-trained language models are widely available in many languages, avoiding the time-consuming and resource-intensive model training directly on tweets from scratch, allowing to focus only on their fine-tuning; (2) available plain text corpora are larger than tweet-only ones, allowing for better performance.

Nada Ghneim | Ammar Joukhadar | Ahmad Hussein | Nada Ghneim | A. Joukhadar | Ahmad Hussein

[1] Paul Rodrigues,et al. Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of Claims using Transformer-based Models , 2020, CLEF.

[2] Francesco Marcelloni,et al. A survey on fake news and rumour detection techniques , 2019, Inf. Sci..

[3] G. Caldarelli,et al. The spreading of misinformation online , 2016, Proceedings of the National Academy of Sciences.

[4] Hazem Hajj,et al. AraBERT: Transformer-based Model for Arabic Language Understanding , 2020, OSACT.

[5] Stavros T. Ponis,et al. Twitter as an instrument for crisis response: The Typhoon Haiyan case study , 2015, ISCRAM.

[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[7] Hueiseok Lim,et al. exBAKE: Automatic Fake News Detection Model Based on Bidirectional Encoder Representations from Transformers (BERT) , 2019, Applied Sciences.

[8] Chirag Shah,et al. Towards automatic fake news classification , 2018 .

[9] Issa Traoré,et al. Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques , 2017, ISDDC.

[10] Sungyong Seo,et al. CSI: A Hybrid Deep Model for Fake News Detection , 2017, CIKM.

[11] Arkaitz Zubiaga,et al. Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[12] Firoj Alam,et al. Findings of the NLP4IF-2021 Shared Tasks on Fighting the COVID-19 Infodemic and Censorship Detection , 2021, NLP4IF.

[13] LiakataMaria,et al. Detection and Resolution of Rumours in Social Media , 2018 .

[14] Huan Liu,et al. FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media , 2018, Big Data.

[15] Nada Ghneim,et al. DamascusTeam at CheckThat! 2020: Check Worthiness on Twitter with Hybrid CNN and RNN Models , 2020, CLEF.

[16] Preslav Nakov,et al. Overview of CheckThat! 2020i Arabic: Automatic Identification and Verification of Claims in Social Media , 2020, CLEF.

[17] Nikos Pelekis,et al. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.

[18] M. Gentzkow,et al. Social Media and Fake News in the 2016 Election , 2017 .

[19] Kevin Driscoll,et al. The diffusion of misinformation on social media: Temporal pattern, message, and source , 2018, Comput. Hum. Behav..

[20] Neil Shah,et al. False Information on Web and Social Media: A Survey , 2018, ArXiv.

[21] Huan Liu,et al. dEFEND: Explainable Fake News Detection , 2019, KDD.

[22] Arkaitz Zubiaga,et al. SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.