Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search

The COVID-19 pandemic has brought about a proliferation of harmful news articles online, with sources lacking credibility and misrepresenting scientific facts. Misinformation has real consequences for consumer health search, i.e., users searching for health information. In the context of multi-stage ranking architectures, there has been little work exploring whether they prioritize correct and credible information over misinformation. We find that, indeed, training models on standard relevance ranking datasets like MS MARCO passage---which have been curated to contain mostly credible information---yields models that might also promote harmful misinformation. To rectify this, we propose a label prediction technique that can separate helpful from harmful content. Our design leverages pretrained sequence-to-sequence transformer models for both relevance ranking and label prediction. Evaluated at the TREC 2020 Health Misinformation Track, our techniques represent the top-ranked system: Our best submitted run was 19.2 points higher than the second-best run based on the primary metric, a 68% relative improvement. Additional post-hoc experiments show that we can boost effectiveness by another 3.5 points.

[1]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[2]  Jimmy J. Lin,et al.  Anserini: Enabling the Use of Lucene for Information Retrieval Research , 2017, SIGIR.

[3]  Jimmy J. Lin,et al.  Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations , 2021, SIGIR.

[4]  Jimmy J. Lin,et al.  Scientific Claim Verification with VerT5erini , 2020, LOUHI.

[5]  Christopher J. C. Burges,et al.  High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[6]  Jimmy J. Lin,et al.  Anserini , 2018, Journal of Data and Information Quality.

[7]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[8]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[9]  Mark D. Smucker,et al.  Overview of the TREC 2020 Health Misinformation Track , 2020, Text Retrieval Conference.

[10]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[11]  Jimmy J. Lin,et al.  Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers , 2021, ECIR.

[12]  Jimmy J. Lin,et al.  A cascade ranking model for efficient ranked retrieval , 2011, SIGIR.

[13]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[14]  Charles L. A. Clarke,et al.  Offline Evaluation without Gain , 2020, ICTIR.

[15]  Hannaneh Hajishirzi,et al.  Fact or Fiction: Verifying Scientific Claims , 2020, EMNLP.

[16]  Christina Lioma,et al.  Overview of the TREC 2019 Decision Track , 2020 .

[17]  Mark D. Smucker,et al.  Offline Evaluation by Maximum Similarity to an Ideal Ranking , 2020, CIKM.

[18]  Jimmy J. Lin,et al.  The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models , 2021, ArXiv.

[19]  Jimmy J. Lin,et al.  Document Ranking with a Pretrained Sequence-to-Sequence Model , 2020, FINDINGS.

[20]  Jimmy J. Lin,et al.  Pretrained Transformers for Text Ranking: BERT and Beyond , 2020, NAACL.

[21]  Jimmy J. Lin,et al.  Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures , 2013, SIGIR.

[22]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[23]  Charles L. A. Clarke,et al.  A Lightweight Environment for Learning Experimental IR Research Practices , 2020, SIGIR.