No Rumours Please! A Multi-Indic-Lingual Approach for COVID Fake-Tweet Detection

The sudden widespread menace created by the present global pandemic COVID-19 has had an unprecedented effect on our lives. Man-kind is going through humongous fear and dependence on social media like never before. Fear inevitably leads to panic, speculations, and the spread of misinformation. Many governments have taken measures to curb the spread of such misinformation for public well being. Besides global measures, to have effective outreach, systems for demographically local languages have an important role to play in this effort. Towards this, we propose an approach to detect fake news about COVID-19 early on from social media, such as tweets, for multiple Indic-Languages besides English. In addition, we also create an annotated dataset of Hindi and Bengali tweet for fake news detection. We propose a BERT based model augmented with additional relevant features extracted from Twitter to identify fake tweets. To expand our approach to multiple Indic languages, we resort to mBERT based model which is fine-tuned over created dataset in Hindi and Bengali. We also propose a zero-shot learning approach to alleviate the data scarcity issue for such low resource languages. Through rigorous experiments, we show that our approach reaches around 89% F-Score in fake tweet detection which supercedes the state-of-the-art (SOTA) results. Moreover, we establish the first benchmark for two Indic-Languages, Hindi and Bengali. Using our annotated data, our model achieves about 79% F-Score in Hindi and 81% F-Score for Bengali Tweets. Our zero-shot model achieves about 81% F-Score in Hindi and 78% F-Score for Bengali Tweets without any annotated data, which clearly indicates the efficacy of our approach.

[1]  Filippo Menczer,et al.  Prevalence of Low-Credibility Information on Twitter During the COVID-19 Outbreak , 2020, ICWSM Workshops.

[2]  Scott A. Hale,et al.  Detecting East Asian Prejudice on Social Media , 2020, ALW.

[3]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[4]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[5]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[6]  Azzam Mourad,et al.  Critical Impact of Social Networks Infodemic on Defeating Coronavirus COVID-19 Pandemic: Twitter-Based Study and Research Directions , 2020, IEEE Transactions on Network and Service Management.

[7]  Tanmoy Chakraborty,et al.  HawkesEye: Detecting Fake Retweeters Using Hawkes Process and Topic Modeling , 2020, IEEE Transactions on Information Forensics and Security.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Noel Crespi,et al.  A First Instagram Dataset on COVID-19 , 2020, ArXiv.

[10]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[11]  Preslav Nakov,et al.  Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society , 2021, EMNLP.

[12]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[13]  Cheng-Te Li,et al.  GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Media , 2020, ACL.

[14]  Sharon Strover,et al.  Analysis of misinformation during the COVID-19 outbreak in China: cultural, social and political entanglements , 2020, ArXiv.

[15]  Christos Christodoulopoulos,et al.  The FEVER2.0 Shared Task , 2019, EMNLP.

[16]  Ponnurangam Kumaraguru,et al.  SpotFake: A Multi-modal Framework for Fake News Detection , 2019, 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM).

[17]  Yichuan Li,et al.  Challenges in Combating COVID-19 Infodemic - Data, Tools, and Ethics , 2020, CIKM.