Recommending research articles to consumers of online vaccination information

Online health communications often provide biased interpretations of evidence and have unreliable links to the source research. We tested the feasibility of a tool for matching web pages to their source evidence. From 207,538 eligible vaccination-related PubMed articles, we evaluated several approaches using 3,573 unique links to web pages from Altmetric. We evaluated methods for ranking the source articles for vaccine-related research described on web pages, comparing simple baseline feature representation and dimensionality reduction approaches to those augmented with canonical correlation analysis (CCA). Performance measures included the median rank of the correct source article; the percentage of web pages for which the source article was correctly ranked first (recall@1); and the percentage ranked within the top 50 candidate articles (recall@50). While augmenting baseline methods using CCA generally improved results, no CCA-based approach outperformed a baseline method, which ranked the correct source article first for over one quarter of web pages and in the top 50 for more than half. Tools to help people identify evidence-based sources for the content they access on vaccination-related web pages are potentially feasible and may support the prevention of bias and misrepresentation of research in news and social media.

[1]  Christian Köhler,et al.  How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews , 2002, BMJ : British Medical Journal.

[2]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[3]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[4]  Enrico W. Coiera,et al.  Automatically applying a credibility appraisal tool to track vaccination-related communications shared on social media , 2019, ArXiv.

[5]  Jungsuk Han,et al.  Searching for Information , 2017, J. Econ. Theory.

[6]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[7]  Tianxi Cai,et al.  Clinical Concept Embeddings Learned from Massive Sources of Medical Data , 2018, ArXiv.

[8]  Adam G. Dunn,et al.  Meeting the challenges of reporting on public health in the new media landscape , 2017 .

[9]  D Charnock,et al.  DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. , 1999, Journal of epidemiology and community health.

[10]  Enrico Coiera,et al.  Prevalence of Disclosed Conflicts of Interest in Biomedical Research and Associations With Journal Impact Factors and Altmetric Scores , 2018, JAMA.

[11]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[12]  Florence T. Bourgeois,et al.  Document similarity measures can support semi-automated identification of unreported links between trial registrations and published reports , 2017, Journal of clinical epidemiology.

[13]  Bart Van Looy,et al.  Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications , 2009, Scientometrics.

[14]  Sasha Shepperd,et al.  Learning to DISCERN online: applying an appraisal tool to health websites in a workshop setting. , 2004, Health education research.

[15]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[16]  A. Kata A postmodern Pandora's box: anti-vaccination misinformation on the Internet. , 2010, Vaccine.

[17]  Annie Y. S. Lau,et al.  Research Paper: Do People Experience Cognitive Biases while Searching for Information? , 2007, J. Am. Medical Informatics Assoc..

[18]  J. Leask,et al.  Australian Newspaper Coverage of Human Papillomavirus Vaccination, October 2006–December 2009 , 2012, Journal of health communication.

[19]  M. Moran,et al.  What makes anti-vaccine websites persuasive? A content analysis of techniques used by anti-vaccine websites to engender anti-vaccine sentiment , 2016 .

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  A. Kata Anti-vaccine activists, Web 2.0, and the postmodern paradigm--an overview of tactics and tropes used online by the anti-vaccination movement. , 2012, Vaccine.

[22]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[23]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[24]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[25]  S. J. Bean Emerging and continuing trends in vaccine opposition website content. , 2011, Vaccine.

[26]  M. Moreno,et al.  Human Papilloma Virus Vaccination. , 2019, JAMA pediatrics.

[27]  Vinay Prasad,et al.  Media Coverage of Medical Journals: Do the Best Articles Make the News? , 2014, PloS one.

[28]  Sanjay Chawla,et al.  Cross-Modal Retrieval: A Pairwise Classification Approach , 2015, SDM.

[29]  S. Ratzan,et al.  Addressing the vaccine confidence gap , 2011, The Lancet.

[30]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[31]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[32]  Isabelle Boutron,et al.  Misrepresentation of Randomized Controlled Trials in Press Releases and News Coverage: A Cohort Study , 2012, PLoS medicine.

[33]  James B. Weaver,et al.  Healthcare non-adherence decisions and internet health information , 2009, Comput. Hum. Behav..

[34]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[35]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[36]  Enrico Coiera,et al.  Automatically Appraising the Credibility of Vaccine-Related Web Pages Shared on Social Media: A Twitter Surveillance Study , 2019, Journal of medical Internet research.

[37]  Lee Rainie,et al.  The online health care revolution: how the web helps americans take better care of themselves , 2000 .

[38]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[39]  D. Veale,et al.  Classification approach , 2005, British Dental Journal.

[40]  J. Hirsh,et al.  The development and validation of an instrument to measure the quality of health research reports in the lay media , 2017, BMC Public Health.

[41]  Isabelle Boutron,et al.  Factors associated with online media attention to research: a cohort study of articles evaluating cancer treatments , 2017, Research integrity and peer review.

[42]  Heidi J. Larson,et al.  The biggest pandemic risk? Viral misinformation , 2018, Nature.

[43]  Dario Landa Silva,et al.  ES-Rank: evolution strategy learning to rank approach , 2017, SAC.