Identification of Keywords From Twitter and Web Blog Posts to Detect Influenza Epidemics in Korea

OBJECTIVE Social media data are a highly contextual health information source. The objective of this study was to identify Korean keywords for detecting influenza epidemics from social media data. METHODS We included data from Twitter and online blog posts to obtain a sufficient number of candidate indicators and to represent a larger proportion of the Korean population. We performed the following steps: initial keyword selection; generation of a keyword time series using a preprocessing approach; optimal feature selection; model building and validation using least absolute shrinkage and selection operator, support vector machine (SVM), and random forest regression (RFR). RESULTS A total of 15 keywords optimally detected the influenza epidemic, evenly distributed across Twitter and blog data sources. Model estimates generated using our SVM model were highly correlated with recent influenza incidence data. CONCLUSIONS The basic principles underpinning our approach could be applied to other countries, languages, infectious diseases, and social media sources. Social media monitoring using our approach may support and extend the capacity of traditional surveillance systems for detecting emerging influenza. (Disaster Med Public Health Preparedness. 2018; 12: 352-359).

[1]  Yiming Yang,et al.  From Lasso regression to Feature vector machine , 2005, NIPS.

[2]  A Hulth,et al.  Web query-based surveillance in Sweden during the influenza A(H1N1)2009 pandemic, April 2009 to February 2010. , 2011, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[3]  T. Kass-Hout,et al.  Social media in public health. , 2013, British medical bulletin.

[4]  Toomas Timpka,et al.  Importance of Internet Surveillance in Public Health Emergency Control and Prevention: Evidence From a Digital Epidemiologic Study During Avian Influenza A H7N9 Outbreaks , 2014, Journal of medical Internet research.

[5]  David Lazer,et al.  Twitter: big data opportunities--response. , 2014, Science.

[6]  Michael J. Paul,et al.  Twitter Improves Influenza Forecasting , 2014, PLoS currents.

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[9]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[10]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[11]  Son Doan,et al.  BioCaster: detecting public health rumors with a Web-based text mining system , 2008, Bioinform..

[12]  Víctor M. Prieto,et al.  Twitter: A Good Place to Detect Health Conditions , 2014, PloS one.

[13]  Michael J. Paul,et al.  Twitter: big data opportunities. , 2014, Science.

[14]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[15]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[16]  E. Nsoesie,et al.  Monitoring Influenza Epidemics in China with Search Query from Baidu , 2013, PloS one.

[17]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[18]  Paul A. Fontelo,et al.  Scanning the Emerging Infectious Diseases Horizon - Visualizing ProMED Emails Using EpiSPIDER , 2007 .

[19]  Armin R Mikler,et al.  Using Web and Social Media for Influenza Surveillance , 2010, Advances in experimental medicine and biology.

[20]  Sérgio Matos,et al.  Analysing Twitter and web queries for flu trend prediction , 2014, Theoretical Biology and Medical Modelling.

[21]  Kasia A Pawelek,et al.  Modeling the impact of twitter on influenza epidemics. , 2014, Mathematical biosciences and engineering : MBE.

[22]  Gail M Williams,et al.  Internet-based surveillance systems for monitoring emerging infectious diseases , 2013, The Lancet Infectious Diseases.

[23]  Emilio Mordini,et al.  The Public Sphere in Emerging Infectious Disease Communication: Recipient or Active and Vocal Partner? , 2015, Disaster Medicine and Public Health Preparedness.

[24]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.