Catching Zika Fever: Application of Crowdsourcing and Machine Learning for Tracking Health Misinformation on Twitter

In February 2016, World Health Organization declared the Zika outbreak a Public Health Emergency of International Concern. With developing evidence it can cause birth defects, and the Summer Olympics coming up in the worst affected country, Brazil, the virus caught fire on social media. In this work, we use Zika as a case study in building a tool for tracking the misinformation around health concerns on Twitter. We collect more than 13 million tweets regarding the Zika outbreak and track rumors outlined by the World Health Organization and Snopes fact checking website. The tool pipeline, which incorporates health professionals, crowdsourcing, and machine learning, allows us to capture health-related rumors around the world, as well as clarification campaigns by reputable health organizations. We discover an extremely bursty behavior of rumor-related topics, and show that, once the questionable topic is detected, it is possible to identify rumor-bearing tweets using automated techniques.

[1]  Carlos Castillo-Chavez,et al.  Mass Media and the Contagion of Fear: The Case of Ebola in America , 2015, PloS one.

[2]  David A. Broniatowski,et al.  Zika vaccine misconceptions: A social media analysis. , 2016, Vaccine.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Hamed Haddadi,et al.  #FoodPorn: Obesity Patterns in Culinary Interactions , 2015, Digital Health.

[5]  Wei Gao,et al.  Detect Rumors Using Time Series of Social Context Information on Microblogging Websites , 2015, CIKM.

[6]  Harry Zhang,et al.  Naive Bayesian Classifiers for Ranking , 2004, ECML.

[7]  Sanmay Das,et al.  Drugs or Dancing? Using Real-Time Machine Learning to Classify Streamed “Dabbing” Homograph Tweets , 2016, 2016 IEEE International Conference on Healthcare Informatics (ICHI).

[8]  Christophe G. Giraud-Carrier,et al.  Prevalence and Attitudes about Illicit and Prescription Drugs on Twitter , 2016, 2016 IEEE International Conference on Healthcare Informatics (ICHI).

[9]  Matthieu Cord,et al.  Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval , 2009, J. Electronic Imaging.

[10]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[11]  Qiaozhu Mei,et al.  Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts , 2015, WWW.

[12]  Lijun Feng,et al.  A Comparison of Features for Automatic Readability Assessment , 2010, COLING.

[13]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[14]  Chang-Tien Lu,et al.  Misinformation Propagation in the Age of Twitter , 2014, Computer.

[15]  Todd Lingren,et al.  Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing , 2013, Journal of medical Internet research.

[16]  J. Drew Procaccino,et al.  Toward wellness: Women seeking health information , 2004, J. Assoc. Inf. Sci. Technol..

[17]  Ben Jones,et al.  Mixed uptake of social media among public health specialists. , 2011, Bulletin of the World Health Organization.

[18]  Carlos Castillo,et al.  AIDR: artificial intelligence for disaster response , 2014, WWW.

[19]  Slava J. Mikhaylov,et al.  Scaling policy preferences from coded political texts , 2011 .

[20]  James M Heilman,et al.  Wikipedia: A Key Tool for Global Public Health Promotion , 2011, Journal of medical Internet research.

[21]  Kenny Q. Zhu,et al.  False rumors detection on Sina Weibo by propagation structures , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[22]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[23]  Clement J. McDonald,et al.  An evaluation of medical knowledge contained in Wikipedia and its use in the LOINC database , 2010, J. Am. Medical Informatics Assoc..

[24]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[25]  Sanda M. Harabagiu,et al.  Automatic extraction of relations between medical concepts in clinical texts , 2011, J. Am. Medical Informatics Assoc..

[26]  Bei Yu,et al.  Crowdsourcing Participatory Evaluation of Medical Pictograms Using Amazon Mechanical Turk , 2013, Journal of medical Internet research.

[27]  Michaël,et al.  Seeking health information online: does Wikipedia matter? , 2009, Journal of the American Medical Informatics Association : JAMIA.

[28]  Elia Gabarron,et al.  Ebola, Twitter, and misinformation: a dangerous combination? , 2014, BMJ : British Medical Journal.

[29]  Patty Kostkova,et al.  VAC Medi+board: Analysing Vaccine Rumours in News and Social Media , 2016, Digital Health.

[30]  Heidi J Larson,et al.  Tracking the global spread of vaccine sentiments: The global response to Japan's suspension of its HPV vaccine recommendation , 2014, Human vaccines & immunotherapeutics.

[31]  Miriam C J M Sturkenboom,et al.  The narcolepsy-pandemic influenza story: can the truth ever be unraveled? , 2015, Vaccine.

[32]  Xiaomo Liu,et al.  Newsworthy Rumor Events: A Case Study of Twitter , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[33]  Christopher C. Yang,et al.  Harnessing Social Media for Drug-Drug Interactions Detection , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[34]  N. Calabretta Consumer-driven, patient-centered health care in the age of electronic information. , 2002, Journal of the Medical Library Association : JMLA.

[35]  Luis Fernández-Luque,et al.  Health and Social Media: Perfect Storm of Information , 2015, Healthcare informatics research.

[36]  Bernd Carsten Stahl,et al.  Digital Wildfires: Propagation, Verification, Regulation, and Responsible Innovation , 2016, TOIS.

[37]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[38]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[39]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[40]  Fan Yang,et al.  Automatic detection of rumor on Sina Weibo , 2012, MDS '12.

[41]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .