Flu Detector - Tracking Epidemics on Twitter

We present an automated tool with a web interface for tracking the prevalence of Influenza-like Illness (ILI) in several regions of the United Kingdom using the contents of Twitter's microblogging service. Our data is comprised by a daily average of approximately 200,000 geolocated tweets collected by targeting 49 urban centres in the UK for a time period of 40 weeks. Official ILI rates from the Health Protection Agency (HPA) form our ground truth. Bolasso, the bootstrapped version of LASSO, is applied in order to extract a consistent set of features, which are then used for learning a regression model.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  K. Neuzil,et al.  Illness among schoolchildren during influenza season: effect on school absenteeism, parental absenteeism from work, and secondary illness in families. , 2002, Archives of pediatrics & adolescent medicine.

[4]  D. Fleming,et al.  Lessons from 40 years' surveillance of influenza in England and Wales , 2007, Epidemiology and Infection.

[5]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[6]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[7]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[8]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[9]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[10]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.