ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research

First identified in Wuhan, China, in December 2019, the outbreak of COVID-19 has been declared as a global emergency in January, and a pandemic in March 2020 by the World Health Organization (WHO). Along with this pandemic, we are also experiencing an "infodemic" of information with low credibility such as fake news and conspiracies. In this work, we present ReCOVery, a repository designed and constructed to facilitate research on combating such information regarding COVID-19. We first broadly search and investigate ~2,000 news publishers, from which 60 are identified with extreme [high or low] levels of credibility. By inheriting the credibility of the media on which they were published, a total of 2,029 news articles on coronavirus, published from January to May 2020, are collected in the repository, along with 140,820 tweets that reveal how these news articles have spread on the Twitter social network. The repository provides multimodal information of news articles on coronavirus, including textual, visual, temporal, and network information. The way that news credibility is obtained allows a trade-off between dataset scalability and label accuracy. Extensive experiments are conducted to present data statistics and distributions, as well as to provide baseline performances for predicting news credibility so that future methods can be compared. Our repository is available at http://coronavirus-fakenews.com.

[1]  Jacob Eisenstein,et al.  Representation Learning for Text-level Discourse Parsing , 2014, ACL.

[2]  Reza Zafarani,et al.  SAFE: Similarity-Aware Multi-Modal Fake News Detection , 2020, PAKDD.

[3]  Preslav Nakov,et al.  Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[4]  Shaheen Sardar,et al.  ‘COVID-19 lockdown: A protective measure or exacerbator of health inequalities? A comparison between the United Kingdom and India.’ a commentary on “the socio-economic implications of the coronavirus and COVID-19 pandemic: A review” , 2020, International Journal of Surgery.

[5]  R. Agha,et al.  World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19) , 2020, International Journal of Surgery.

[6]  Reza Zafarani,et al.  Fake News Early Detection , 2019, Digital Threats: Research and Practice.

[7]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media , 2018, ArXiv.

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[10]  Philip S. Yu,et al.  FakeDetector: Effective Fake News Detection with Deep Diffusive Neural Network , 2018, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[11]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[12]  John S. Brownstein,et al.  Epidemiological data from the COVID-19 outbreak, real-time case information , 2020, Scientific Data.

[13]  Fenglong Ma,et al.  EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection , 2018, KDD.

[14]  E. Dong,et al.  An interactive web-based dashboard to track COVID-19 in real time , 2020, The Lancet Infectious Diseases.

[15]  Emilio Ferrara,et al.  The history of digital spam , 2019, Commun. ACM.

[16]  Reza Zafarani,et al.  A Survey of Fake News , 2020, ACM Comput. Surv..

[17]  Reza Zafarani,et al.  Social Media Mining: An Introduction , 2014 .

[18]  Eric Gilbert,et al.  CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations , 2015, ICWSM.

[19]  Suhang Wang,et al.  Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository , 2020, ICWSM.

[20]  Yanjie Fu,et al.  Fake News Detection with Deep Diffusive Network Model , 2018, ArXiv.

[21]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[22]  Reza Zafarani,et al.  Fake News Research: Theories, Detection Strategies, and Open Problems , 2019, KDD.

[23]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[24]  Reza Zafarani,et al.  Credibility-based Fake News Detection , 2019, Lecture Notes in Social Networks.

[25]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[26]  Kristina Lerman,et al.  Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set , 2020, JMIR public health and surveillance.

[27]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[28]  Sibel Adali,et al.  NELA-GT-2018: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles , 2019, ICWSM.

[29]  Limeng Cui,et al.  CoAID: COVID-19 Healthcare Misinformation Dataset , 2020, ArXiv.