Predicting the Flu from Instagram

Conventional surveillance systems for monitoring infectious diseases, such as influenza, face challenges due to shortage of skilled healthcare professionals, remoteness of communities and absence of communication infrastructures. Internet-based approaches for surveillance are appealing logistically as well as economically. Search engine queries and Twitter have been the primarily used data sources in such approaches. The aim of this study is to assess the predictive power of an alternative data source, Instagram. By using 317 weeks of publicly available data from Instagram, we trained several machine learning algorithms to both nowcast and forecast the number of official influenza-like illness incidents in Finland where population-wide official statistics about the weekly incidents are available. In addition to date and hashtag count features of online posts, we were able to utilize also the visual content of the posted images with the help of deep convolutional neural networks. Our best nowcasting model reached a mean absolute error of 11.33 incidents per week and a correlation coefficient of 0.963 on the test data. Forecasting models for predicting 1 week and 2 weeks ahead showed statistical significance as well by reaching correlation coefficients of 0.903 and 0.862, respectively. This study demonstrates how social media and in particular, digital photographs shared in them, can be a valuable source of information for the field of infodemiology.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  F. Ellis McKenzie,et al.  Influenza Forecasting in Human Populations: A Scoping Review , 2014, PloS one.

[3]  Alina Deshpande,et al.  Global Disease Monitoring and Forecasting with Wikipedia , 2014, PLoS Comput. Biol..

[4]  D A Asch,et al.  The content of social media's shared images about Ebola: a retrospective study. , 2015, Public health.

[5]  Alok N. Choudhary,et al.  Forecasting Influenza Levels Using Real-Time Social Media Streams , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[6]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[7]  John S. Brownstein,et al.  Using electronic health records and Internet search information for accurate influenza forecasting , 2017, BMC Infectious Diseases.

[8]  Peng Guan,et al.  Forecasting influenza epidemics by integrating internet search queries and traditional surveillance data with the support vector machine regression model in Liaoning, from 2011 to 2015 , 2018, PeerJ.

[9]  Munmun De Choudhury,et al.  Characterizing Dietary Choices, Nutrition, and Language in Food Deserts via Social Media , 2016, CSCW.

[10]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[11]  Dongwon Lee,et al.  Teens are from mars, adults are from venus: analyzing and predicting age groups with behavioral characteristics in instagram , 2016, WebSci.

[12]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[13]  Lauren Ancel Meyers,et al.  Optimal multi-source forecasting of seasonal influenza , 2018, PLoS Comput. Biol..

[14]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[15]  Munmun De Choudhury,et al.  Measuring and Characterizing Nutritional Information of Food and Ingestion Content in Instagram , 2015, WWW.

[16]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[17]  Kerrie Mengersen,et al.  Using Google Trends and ambient temperature to predict seasonal influenza outbreaks. , 2018, Environment international.

[18]  Stephen S Morse,et al.  Public health surveillance and infectious disease detection. , 2012, Biosecurity and bioterrorism : biodefense strategy, practice, and science.

[19]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.

[20]  Miad Faezipour,et al.  A review of influenza detection and prediction through social networking sites , 2018, Theoretical Biology and Medical Modelling.

[21]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Madhav V. Marathe,et al.  Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions , 2014, SDM.

[24]  Ye Wen,et al.  Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model , 2017, Scientific Reports.

[25]  Shuai Wang,et al.  Smoking Selfies: Using Instagram to Explore Young Women’s Smoking Behaviors , 2018, Social Media + Society.

[26]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[27]  Leslie Citrome,et al.  Book Review: Pandemic Influenza Preparedness and Response, a WHO Guidance Document , 2010 .

[28]  John S. Brownstein,et al.  Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time , 2014, PLoS Comput. Biol..

[29]  Michael J. Paul,et al.  Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study , 2015, JMIR public health and surveillance.

[30]  M. Moreno,et al.  Social Drinking on Social Media: Content Analysis of the Social Aspects of Alcohol-Related Posts on Facebook and Instagram , 2018, Journal of medical Internet research.

[31]  Ellyn Ayton,et al.  Forecasting influenza-like illness dynamics for military populations using neural networks and social media , 2017, PloS one.

[32]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Roy Cherian,et al.  Representations of Codeine Misuse on Instagram: Content Analysis , 2018, JMIR public health and surveillance.

[34]  E. Nsoesie,et al.  A systematic review of studies on forecasting the dynamics of influenza outbreaks , 2013, Influenza and other respiratory viruses.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Timothy P Gauthier,et al.  Instagram and Clinical Infectious Diseases. , 2015, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[37]  Chang-Gun Lee,et al.  Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea , 2016, Journal of medical Internet research.

[38]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[39]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[40]  A. Anderson Social Media Use in 2018 , 2018 .

[41]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[42]  Ping Guo,et al.  Comparative studies on similarity measures for remote sensing image retrieval , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[43]  Daniela Perrotta,et al.  Forecasting Seasonal Influenza Fusing Digital Indicators and a Mechanistic Disease Model , 2017, WWW.

[44]  Niel Hens,et al.  Influenza epidemic surveillance and prediction based on electronic health record data from an out-of-hours general practitioner cooperative: model development and validation on 2003–2015 data , 2017, BMC Infectious Diseases.

[45]  Rok Sosic,et al.  Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis , 2018, JMIR public health and surveillance.

[46]  Lang Li,et al.  Monitoring Potential Drug Interactions and Reactions via Network Analysis of Instagram User Timelines , 2015, PSB.

[47]  Mehmet Tan,et al.  Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu trend data , 2015, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE).

[48]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[49]  Christian Stefansen,et al.  GOOGLE DISEASE TRENDS: AN UPDATE , 2013 .

[50]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[51]  E. Nsoesie,et al.  Monitoring Influenza Epidemics in China with Search Query from Baidu , 2013, PloS one.

[52]  Christopher M. Danforth,et al.  Instagram photos reveal predictive markers of depression , 2016, EPJ Data Science.

[53]  Sachin Muralidhara,et al.  #Healthy Selfies: Exploration of Health Topics on Instagram , 2018, JMIR public health and surveillance.

[54]  Raja Jurdak,et al.  Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets , 2017, WWW.

[55]  Laura J. Bierut,et al.  Marijuana-Related Posts on Instagram , 2016, Prevention Science.

[56]  Kwok-Leung Tsui,et al.  Forecasting influenza in Hong Kong with Google search queries and statistical model fusion , 2017, PloS one.

[57]  John J. Treanor,et al.  167 – Influenza (Including Avian Influenza and Swine Influenza) , 2015 .

[58]  James M. Hyman,et al.  Forecasting the 2013–2014 Influenza Season Using Wikipedia , 2014, PLoS Comput. Biol..

[59]  Jang Seok Oh,et al.  Use of Hangeul Twitter to Track and Predict Human Influenza Infection , 2013, PloS one.

[60]  Philip S. Yu,et al.  Mining Online Social Data for Detecting Social Network Mental Disorders , 2016, WWW.