MM-COVID: A Multilingual and Multidimensional Data Repository for CombatingCOVID-19 Fake New

The COVID-19 epidemic is considered as the global health crisis of the whole society and the greatest challenge mankind faced since World War Two. Unfortunately, the fake news about COVID-19 is spreading as fast as the virus itself. The incorrect health measurements, anxiety, and hate speeches will have bad consequences on people's physical health, as well as their mental health in the whole world. To help better combat the COVID-19 fake news, we propose a new fake news detection dataset MM-COVID(Multilingual and Multidimensional COVID-19 Fake News Data Repository). This dataset provides the multilingual fake news and the relevant social context. We collect 3981 pieces of fake news content and 7192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages. We present a detailed and exploratory analysis of MM-COVID from different perspectives and demonstrate the utility of MM-COVID in several potential applications of COVID-19 fake news study on multilingual and social media.

[1]  Ian Witten,et al.  Data Mining , 2000 .

[2]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[3]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[4]  Wei Chen,et al.  Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model , 2011, SDM.

[5]  Nam P. Nguyen,et al.  Containment of misinformation spread in online social networks , 2012, WebSci '12.

[6]  Wei Gao,et al.  Detecting Rumors from Microblogs with Recurrent Neural Networks , 2016, IJCAI.

[7]  Yongdong Zhang,et al.  Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs , 2017, ACM Multimedia.

[8]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[9]  Giovanni Luca Ciampaglia,et al.  The spread of low-credibility content by social bots , 2017, Nature Communications.

[10]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media , 2018, ArXiv.

[11]  H. Russell Bernard,et al.  Studying Fake News via Network Analysis: Detection and Mitigation , 2018, Lecture Notes in Social Networks.

[12]  Yang Liu,et al.  Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks , 2018, AAAI.

[13]  Yan Liu,et al.  Neural User Response Generator: Fake News Detection with Collective User Intelligence , 2018, IJCAI.

[14]  Huan Liu,et al.  dEFEND: Explainable Fake News Detection , 2019, KDD.

[15]  Endang Wahyu Pamungkas,et al.  Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon , 2019, ACL.

[16]  Kai Shu,et al.  FakeNewsTracker: a tool for fake news collection, detection, and visualization , 2018, Computational and Mathematical Organization Theory.

[17]  Emilio Ferrara,et al.  ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research , 2020, CIKM.

[18]  Subhabrata Mukherjee,et al.  Leveraging Multi-Source Weak Social Supervision for Early Detection of Fake News , 2020, ArXiv.

[19]  Limeng Cui,et al.  CoAID: COVID-19 Healthcare Misinformation Dataset , 2020, ArXiv.

[20]  Kathleen M. Carley,et al.  Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset , 2020, CIKM.

[21]  Ioannis Korkontzelos,et al.  A curated collection of COVID-19 online datasets , 2020, ArXiv.

[22]  Suhang Wang,et al.  Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository , 2020, ICWSM.

[23]  Amrita Bhattacharjee,et al.  Disinformation in the Online Information Ecosystem: Detection, Mitigation and Challenges , 2020, ArXiv.

[24]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[25]  Durgesh Nandini,et al.  FakeCovid - A Multilingual Cross-domain Fact Check News Dataset for COVID-19 , 2020, ICWSM Workshops.

[26]  Lukas Stappen,et al.  Cross-lingual Zero- and Few-shot Hate Speech Detection Utilising Frozen Transformer Language Models and AXEL , 2020, ArXiv.