Detecting COVID-19 Misinformation on Social Media

The ongoing pandemic has heightened the need for developing tools to flag COVID-19related misinformation on the internet, specifically on social media such as Twitter. However, due to novel language and the rapid change of information, existing misinformation detection datasets are not effective in evaluating systems designed to detect misinformation on this topic. Misinformation detection can be subdivided into two sub-tasks retrieval of misconceptions relevant to posts being checked for veracity, and stance detection to identify whether the posts agree, disagree, or express no stance towards the retrieved misconceptions. To facilitate research on this task, we release COVID-Lies1, a dataset of 5K expert-annotated tweets to evaluate the performance of misinformation detection systems on 86 different pieces of COVID-19 related misinformation. We evaluate existing NLP systems on this dataset, providing first benchmarks and identifying key challenges for future models to improve upon.

[1]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[2]  Andreas Vlachos,et al.  Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.

[3]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[4]  Jabra Zarka,et al.  Coronavirus Goes Viral: Quantifying the COVID-19 Misinformation Epidemic on Twitter , 2020, Cureus.

[5]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[6]  Svitlana Volkova,et al.  Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter , 2017, ACL.

[7]  Yin Leng Theng,et al.  Deterring the spread of misinformation on social network sites: A social cognitive theory‐guided intervention , 2015, ASIST.

[8]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[9]  Akshay Jain,et al.  Fake News Detection , 2018, 2018 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS).

[10]  Orestis Papakyriakopoulos,et al.  NLP-based Feature Extraction for the Detection of COVID-19 Misinformation Videos on YouTube , 2020, NLPCOVID19.

[11]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[12]  Chengkai Li,et al.  Jennifer for COVID-19: An NLP-Powered Chatbot Built for the People and by the People to Combat Misinformation , 2020, NLPCOVID19.

[13]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media , 2018, ArXiv.

[14]  Eugenio Tacchini,et al.  Some Like it Hoax: Automated Fake News Detection in Social Networks , 2017, ArXiv.

[15]  Andreas Vlachos,et al.  Emergent: a novel data-set for stance classification , 2016, NAACL.

[16]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[17]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[18]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[19]  Nick Feamster,et al.  #bias: Measuring the Tweeting Behavior of Propagandists , 2012, ICWSM.

[20]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[21]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[22]  Vivian Lee,et al.  Asked and Answered: Building a Chatbot to Address Covid-19-Related Concerns , 2020, Nejm Catalyst Innovations in Care Delivery.

[23]  Jiliang Tang,et al.  Multi-Source Multi-Class Fake News Detection , 2018, COLING.

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[26]  Hung-Yu Kao,et al.  Fake News Detection as Natural Language Inference , 2019, ArXiv.

[27]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[28]  Andreas Vlachos,et al.  Fake news stance detection using stacked ensemble of classifiers , 2017, NLPmJ@EMNLP.

[29]  Kai Shu Beyond News Contents: The Role of Social Context for Fake News Detection , 2018 .

[30]  Kristina Lerman,et al.  Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set , 2020, JMIR public health and surveillance.

[31]  Arkaitz Zubiaga,et al.  SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours , 2019, *SEMEVAL.

[32]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[33]  David G. Rand,et al.  Fighting COVID-19 Misinformation on Social Media: Experimental Evidence for a Scalable Accuracy-Nudge Intervention , 2020, Psychological science.

[34]  Ismini Lourentzou,et al.  Drink bleach or do what now? Covid-HeRA: A dataset for risk-informed health decision making in the presence of COVID19 misinformation , 2020, ArXiv.

[35]  Marcel Salathé,et al.  COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter , 2020, Frontiers in Artificial Intelligence.

[36]  Bo Zhao,et al.  On the Discovery of Evolving Truth , 2015, KDD.

[37]  Sebastian Tschiatschek,et al.  Fake News Detection in Social Networks via Crowd Signals , 2017, WWW.

[38]  Xiaojun Wan,et al.  Learning to Identify Ambiguous and Misleading News Headlines , 2017, IJCAI.

[39]  Andreas Vlachos,et al.  The Fact Extraction and VERification (FEVER) Shared Task , 2018, FEVER@EMNLP.

[40]  Jure Leskovec,et al.  Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes , 2016, WWW.

[41]  Erin McAweeney,et al.  Cultural Convergence: Insights into the behavior of misinformation networks on Twitter , 2020, ArXiv.