A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection

The significance of social media has increased manifold in the past few decades as it helps people from even the most remote corners of the world stay connected. With the COVID-19 pandemic raging, social media has become more relevant and widely used than ever before, and along with this, there has been a resurgence in the circulation of fake news and tweets that demand immediate attention. In this paper, we describe our Fake News Detection system that automatically identifies whether a tweet related to COVID-19 is “real” or “fake”, as a part of CONSTRAINT COVID19 Fake News Detection in English challenge. We have used an ensemble model consisting of pre-trained models that has helped us achieve a joint 8th position on the leader board. We have achieved an F1-score of 0.9831 against a top score of 0.9869. Post completion of the competition, we have been able to drastically improve our system by incorporating a novel heuristic algorithm based on username handles and link domains in tweets fetching an F1-score of 0.9883 and achieving state-of-the art results on the given dataset.

[1]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[2]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[3]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[6]  Sungyong Seo,et al.  CSI: A Hybrid Deep Model for Fake News Detection , 2017, CIKM.

[7]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[8]  Abraham M. Rutchick,et al.  Political Ideology Predicts Perceptions of the Threat of COVID-19 (and Susceptibility to Fake News About It) , 2020, Social Psychological and Personality Science.

[9]  Jiliang Tang,et al.  Multi-Source Multi-Class Fake News Detection , 2018, COLING.

[10]  Jeffrey T. Hancock,et al.  Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel , 2014, PloS one.

[11]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[12]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[13]  Tanmoy Chakraborty,et al.  Fighting an Infodemic: COVID-19 Fake News Dataset , 2020, CONSTRAINT@AAAI.

[14]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[15]  Davide Eynard,et al.  Fake News Detection on Social Media using Geometric Deep Learning , 2019, ArXiv.

[16]  Philip S. Yu,et al.  FakeDetector: Effective Fake News Detection with Deep Diffusive Neural Network , 2018, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Fabrício Benevenuto,et al.  Supervised Learning for Fake News Detection , 2019, IEEE Intelligent Systems.

[19]  Jianfeng Gao,et al.  DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.

[20]  Graeme Hirst,et al.  Detecting Deceptive Opinions with Profile Compatibility , 2013, IJCNLP.

[21]  Joshua A. Tucker,et al.  Social Media, Political Polarization, and Political Disinformation: A Review of the Scientific Literature , 2018 .

[22]  Kai Shu Beyond News Contents: The Role of Social Context for Fake News Detection , 2018 .