Transformers to Fight the COVID-19 Infodemic

The massive spread of false information on social media has become a global risk especially in a global pandemic situation like COVID-19. False information detection has thus become a surging research topic in recent months. NLP4IF-2021 shared task on fighting the COVID-19 infodemic has been organised to strengthen the research in false information detection where the participants are asked to predict seven different binary labels regarding false information in a tweet. The shared task has been organised in three languages; Arabic, Bulgarian and English. In this paper, we present our approach to tackle the task objective using transformers. Overall, our approach achieves a 0.707 mean F1 score in Arabic, 0.578 mean F1 score in Bulgarian and 0.864 mean F1 score in English ranking 4^{th} place in all the languages.

[1]  Tharindu Ranasinghe,et al.  Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media , 2019, RANLP.

[2]  Tharindu Ranasinghe,et al.  TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation , 2021, SEMEVAL.

[3]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[4]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[5]  Luo Si,et al.  Rumor Detection by Exploiting User Credibility Information, Attention and Multi-task Learning , 2019, ACL.

[6]  Tharindu Ranasinghe,et al.  InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative Tweet Extraction , 2020, WNUT.

[7]  Tharindu Ranasinghe,et al.  BRUMS at SemEval-2020 Task 12: Transformer Based Multilingual Offensive Language Identification in Social Media , 2020, SEMEVAL.

[8]  Wei Gao,et al.  Detecting Rumors from Microblogs with Recurrent Neural Networks , 2016, IJCAI.

[9]  Hazem Hajj,et al.  AraBERT: Transformer-based Model for Arabic Language Understanding , 2020, OSACT.

[10]  Nadir Durrani,et al.  Farasa: A Fast and Furious Segmenter for Arabic , 2016, NAACL.

[11]  Marcos Zampieri,et al.  BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification , 2019, FIRE.

[12]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[13]  Marcos Zampieri,et al.  MUDES: Multilingual Detection of Offensive Spans , 2021, NAACL.

[14]  Marcos Zampieri,et al.  Comparing Approaches to Dravidian Language Identification , 2021, VARDIAL.

[15]  Kyomin Jung,et al.  Prominent Features of Rumor Propagation in Online Social Media , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  Saif Mohammad,et al.  Stance and Sentiment in Tweets , 2016, ACM Trans. Internet Techn..

[17]  Marcos Zampieri,et al.  Multilingual Offensive Language Identification with Cross-lingual Embeddings , 2020, EMNLP.

[18]  Bin Guo,et al.  The Future of False Information Detection on Social Media , 2020, ACM Comput. Surv..

[19]  Thomas Eckart,et al.  OSIAN: Open Source International Arabic News Corpus - Preparation and Integration into the CLARIN-infrastructure , 2019, WANLP@ACL 2019.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[22]  Firoj Alam,et al.  Findings of the NLP4IF-2021 Shared Tasks on Fighting the COVID-19 Infodemic and Censorship Detection , 2021, NLP4IF.

[23]  Marcos Zampieri,et al.  Offensive Language Identification in Greek , 2020, LREC.

[24]  Marcos Zampieri,et al.  Multilingual Offensive Language Identification for Low-resource Languages , 2021, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[25]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Hansi Hettiarachchi,et al.  BRUMS at SemEval-2020 Task 3: Contextualised Embeddings for Predicting the (Graded) Effect of Context in Word Similarity , 2020, SEMEVAL.

[28]  Ibrahim Abu El-Khair,et al.  1.5 billion words Arabic Corpus , 2016, ArXiv.

[29]  Marcos Zampieri,et al.  WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans , 2021, International Workshop on Semantic Evaluation.

[30]  Ifeoma Nwogu,et al.  WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments , 2020, FIRE.

[31]  Azzam Mourad,et al.  Critical Impact of Social Networks Infodemic on Defeating Coronavirus COVID-19 Pandemic: Twitter-Based Study and Research Directions , 2020, IEEE Transactions on Network and Service Management.