What Are People Tweeting About Zika? An Exploratory Study Concerning Its Symptoms, Treatment, Transmission, and Prevention

Background In order to harness what people are tweeting about Zika, there needs to be a computational framework that leverages machine learning techniques to recognize relevant Zika tweets and, further, categorize these into disease-specific categories to address specific societal concerns related to the prevention, transmission, symptoms, and treatment of Zika virus. Objective The purpose of this study was to determine the relevancy of the tweets and what people were tweeting about the 4 disease characteristics of Zika: symptoms, transmission, prevention, and treatment. Methods A combination of natural language processing and machine learning techniques was used to determine what people were tweeting about Zika. Specifically, a two-stage classifier system was built to find relevant tweets about Zika, and then the tweets were categorized into 4 disease categories. Tweets in each disease category were then examined using latent Dirichlet allocation (LDA) to determine the 5 main tweet topics for each disease characteristic. Results Over 4 months, 1,234,605 tweets were collected. The number of tweets by males and females was similar (28.47% [351,453/1,234,605] and 23.02% [284,207/1,234,605], respectively). The classifier performed well on the training and test data for relevancy (F1 score=0.87 and 0.99, respectively) and disease characteristics (F1 score=0.79 and 0.90, respectively). Five topics for each category were found and discussed, with a focus on the symptoms category. Conclusions We demonstrate how categories of discussion on Twitter about an epidemic can be discovered so that public health officials can understand specific societal concerns within the disease-specific categories. Our two-stage classifier was able to identify relevant tweets to enable more specific analysis, including the specific aspects of Zika that were being discussed as well as misinformation being expressed. Future studies can capture sentiments and opinions on epidemic outbreaks like Zika virus in real time, which will likely inform efforts to educate the public at large.

[1]  Aparup Khatua,et al.  Immediate and long-term effects of 2016 Zika Outbreak: A Twitter-based study , 2016, 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom).

[2]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[3]  Thomas Hofmann,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2007 .

[4]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[5]  Thanh Tran,et al.  Understanding citizen reactions and Ebola-related information propagation on social media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[6]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[7]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[8]  Jay M Bernhardt,et al.  Identifying the public's concerns and the Centers for Disease Control and Prevention's reactions during a health crisis: An analysis of a Zika live Twitter chat. , 2016, American journal of infection control.

[9]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[10]  Eibe Frank,et al.  Naive Bayes for Text Classification with Unbalanced Classes , 2006, PKDD.

[11]  Elia Gabarron,et al.  Ebola, Twitter, and misinformation: a dangerous combination? , 2014, BMJ : British Medical Journal.

[12]  Zion Tsz Ho Tse,et al.  How people react to Zika virus outbreaks on Twitter? A computational content analysis. , 2016, American journal of infection control.

[13]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[14]  Christoph Lofi,et al.  Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use , 2015, J. Biomed. Informatics.

[15]  Pablo N. Mendes,et al.  Twitris 2.0 : Semantically Empowered System for Understanding Perceptions From Social Data , 2010 .

[16]  Padmini Srinivasan,et al.  Discovering Health Beliefs in Twitter , 2012, AAAI Fall Symposium: Information Retrieval and Knowledge Discovery in Biomedical Text.

[17]  Mauricio Santillana,et al.  Utilizing Nontraditional Data Sources for Near Real-Time Estimation of Transmission Dynamics During the 2015-2016 Colombian Zika Virus Disease Outbreak , 2016, JMIR public health and surveillance.

[18]  Krishnaprasad Thirunarayan,et al.  “When ‘Bad’ is ‘Good’”: Identifying Personal Communication and Sentiment in Drug-Related Tweets , 2016, JMIR public health and surveillance.

[19]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[20]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[21]  Samson S. Y. Wong,et al.  Zika virus infection-the next wave after dengue? , 2016, Journal of the Formosan Medical Association = Taiwan yi zhi.

[22]  Amit P. Sheth,et al.  Gender-based violence in 140 characters or fewer: a #BigData case study of Twitter , 2015, PeerJ Prepr..

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Ingmar Weber,et al.  U.S. Religious Landscape on Twitter , 2014, SocInfo.