Can online self-reports assist in real-time identification of influenza vaccination uptake? A cross-sectional study of influenza vaccine-related tweets in the USA, 2013–2017

Introduction The Centers for Disease Control and Prevention (CDC) spend significant time and resources to track influenza vaccination coverage each influenza season using national surveys. Emerging data from social media provide an alternative solution to surveillance at both national and local levels of influenza vaccination coverage in near real time. Objectives This study aimed to characterise and analyse the vaccinated population from temporal, demographical and geographical perspectives using automatic classification of vaccination-related Twitter data. Methods In this cross-sectional study, we continuously collected tweets containing both influenza-related terms and vaccine-related terms covering four consecutive influenza seasons from 2013 to 2017. We created a machine learning classifier to identify relevant tweets, then evaluated the approach by comparing to data from the CDC’s FluVaxView. We limited our analysis to tweets geolocated within the USA. Results We assessed 1 124 839 tweets. We found strong correlations of 0.799 between monthly Twitter estimates and CDC, with correlations as high as 0.950 in individual influenza seasons. We also found that our approach obtained geographical correlations of 0.387 at the US state level and 0.467 at the regional level. Finally, we found a higher level of influenza vaccine tweets among female users than male users, also consistent with the results of CDC surveys on vaccine uptake. Conclusion Significant correlations between Twitter data and CDC data show the potential of using social media for vaccination surveillance. Temporal variability is captured better than geographical and demographical variability. We discuss potential paths forward for leveraging this approach.

[1]  Mark Payne,et al.  Health and Human Services , 2020, Congress and the Nation 2013-2016, Volume XIV: Politics and Policy in the 113th and 114th Congresses.

[2]  R. Stewart EPIDEMIOLOGY IN THE ERA OF BIG DATA: OPPORTUNITIES AND CHALLENGES , 2018, Alzheimer's & Dementia.

[3]  K. McGregor,et al.  Natural Language Processing Approaches to Understand HPV Vaccination Sentiment , 2018 .

[4]  P. Krauskopf Morbidity and Mortality Weekly Report (MMWR) Express and Making Healthy Choices Apps , 2018 .

[5]  Elad Yom-Tov,et al.  Estimating the Population Impact of a New Pediatric Influenza Vaccination Program in England Using Social Media Content , 2017, Journal of medical Internet research.

[6]  L. Grohskopf,et al.  Background Document for “Prevention and Control of Seasonal Influenza with Vaccines: Recommendations of the Advisory Committee on Immunization Practices—United States, 2017-18 Influenza Season” Introduction , 2017 .

[7]  Jingcheng Du,et al.  Leveraging machine learning-based approaches to assess human papillomavirus vaccination sentiment trends with Twitter data , 2017, BMC Medical Informatics and Decision Making.

[8]  Samarth Swarup,et al.  Semantic network analysis of vaccine sentiment in online social media. , 2017, Vaccine.

[9]  K. Mandl,et al.  Mapping information exposure on social media to explain differences in HPV vaccine coverage in the United States. , 2017, Vaccine.

[10]  Mark Dredze,et al.  Examining Patterns of Influenza Vaccination in Social Media , 2017, AAAI Workshops.

[11]  Aman Verma,et al.  Media content about vaccines in the United States and Canada, 2012-2014: An analysis using data from the Vaccine Sentimeter. , 2016, Vaccine.

[12]  Timothy R. Tangherlini,et al.  “Mommy Blogs” and the Vaccination Exemption Narrative: Results From A Machine-Learning Approach for Story Aggregation on Parenting Social Media Sites , 2016, JMIR public health and surveillance.

[13]  K. Greenlund,et al.  National weighting of data from the Behavioral Risk Factor Surveillance System (BRFSS) , 2016, BMC Medical Research Methodology.

[14]  Mark Dredze,et al.  Demographer: Extremely Simple Name Demographics , 2016, NLP+CSS@EMNLP.

[15]  H. Al-Abdely,et al.  Zika , 2016, Saudi Medical Journal.

[16]  Lyle H. Ungar,et al.  Analyzing Biases in Human Perception of User Age and Gender from Text , 2016, ACL.

[17]  David A. Broniatowski,et al.  Zika vaccine misconceptions: A social media analysis. , 2016, Vaccine.

[18]  M. Shigematsu,et al.  Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review , 2015, PloS one.

[19]  Stephen J Mooney,et al.  Commentary: Epidemiology in the Era of Big Data , 2015, Epidemiology.

[20]  Cesira Pasquarella,et al.  Effectiveness of interventions that apply new media to improve vaccine uptake and vaccine coverage , 2015, Human vaccines & immunotherapeutics.

[21]  Xujuan Zhou,et al.  Using social connection information to improve opinion mining: Identifying negative sentiment about HPV vaccines on Twitter , 2015, MedInfo.

[22]  Shalini L Kulasingam,et al.  Estimation of Geographic Variation in Human Papillomavirus Vaccine Uptake in Men and Women: An Online Survey Using Facebook Recruitment , 2014, Journal of medical Internet research.

[23]  Michael J. Paul,et al.  Discovering Health Topics in Social Media Using Topic Models , 2014, PloS one.

[24]  Zeynep Tufekci,et al.  Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls , 2014, ICWSM.

[25]  K. Denecke,et al.  Social Media and Internet-Based Data in Global Systems for Public Health Surveillance: A Systematic Review , 2014, The Milbank quarterly.

[26]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[27]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[28]  Branalyn K. Williams National Center For Chronic Disease Prevention and Health Promotion regions , 2013 .

[29]  Michael J. Paul,et al.  Carmen: A Twitter Geolocation System with Applications to Public Health , 2013 .

[30]  Marcel Salathé,et al.  The dynamics of health behavior sentiments on a large online social network , 2012, EPJ Data Science.

[31]  Marcel Salathé,et al.  Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control , 2011, PLoS Comput. Biol..

[32]  Nigel Collier,et al.  OMG U got flu? Analysis of shared health messages for bio-surveillance , 2011, Semantic Mining in Biomedicine.

[33]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[34]  Armin R. Mikler,et al.  Text and Structural Data Mining of Influenza Mentions in Web and Social Media , 2010, International journal of environmental research and public health.

[35]  K. D. Nargaye,et al.  Use of Lot Quality Assurance Sampling (LQAS) to estimate vaccination coverage helps guide future vaccination efforts. , 2008, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[36]  S. Keeter The Impact of Cell Phone Noncoverage Bias on Polling in the 2004 Presidential Election , 2006 .

[37]  W. Härdle,et al.  Statistics of Financial Markets: An Introduction , 2004 .

[38]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .