Mining social media streams to improve public health allergy surveillance

Allergies are one of the most common chronic diseases worldwide. One in five Americans suffer from either allergy or asthma symptoms. With the prevalence of social media, people sharing experiences and opinions on personal health symptoms and concerns on social media are increasing. Mining those publicly available health related data potentially provides valuable healthcare insights. In this paper, we propose a real-time allergy surveillance system that first classifies tweets to identify those that mention actual allergy incidents using bag-of-words model and NaiveBayesMultinomial classifier and applies in-depth text and spatiotemporal analysis. Our experimental results show that the proposed system can detect predominant allergy types with high precision and that allergy-related tweet volume is highly correlated to the weather data (daily maximum temperature). We believe that this is the first study that examines a large-scale social media stream for in-depth analysis of allergy activities.

[1]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[2]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.

[4]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[5]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[6]  W. Marsden I and J , 2012 .

[7]  S. Magruder Evaluation of Over-the-Counter Pharmaceutical Sales As a Possible Early Warning Indicator of Human Disease , 2003 .

[8]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[9]  Benyuan Liu,et al.  Vision: towards real time epidemic vigilance through online social networks: introducing SNEFT -- social network enabled flu trends , 2010, MCS '10.

[10]  J. Corden,et al.  The trend to earlier birch pollen seasons in the U.K.: A biotic response to changes in weather conditions? , 1997 .

[11]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[12]  L. A. Weger,et al.  Development and validation of a 5-day-ahead hay fever forecast for patients with grass-pollen-induced allergic rhinitis , 2013, International Journal of Biometeorology.

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Michael M. Wagner,et al.  Telephone Triage: A Timely Data Source for Surveillance of Influenza-like Diseases , 2003, AMIA.

[15]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[16]  D. Blackwell,et al.  Summary health statistics for U.S. children: National Health Interview Survey, 2000. , 2003, Vital and health statistics. Series 10, Data from the National Health Survey.

[17]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[18]  Alok N. Choudhary,et al.  Real-time disease surveillance using Twitter data: demonstration on flu and cancer , 2013, KDD.

[19]  Marcel Salathé,et al.  Discovering health-related knowledge in social media using ensembles of heterogeneous features , 2013, CIKM.

[20]  Barbara Bloom,et al.  Summary health statistics for u.s. Children: national health interview survey, 2011. , 2012, Vital and health statistics. Series 10, Data from the National Health Survey.

[21]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[22]  Wei-keng Liao,et al.  Social media evolution of the Egyptian revolution , 2012, Commun. ACM.

[23]  Matthew Smith,et al.  A real-time architecture for detection of diseases using social networks: design, implementation and evaluation , 2012, HT '12.

[24]  Machelle D. Wilson,et al.  Correlation between Atmospheric Grass Pollen Levels and Three Weather Variables during 2002-2004 in a Tropical Urban Area , 2011 .

[25]  Jeannine S. Schiller,et al.  Summary health statistics for U.S. adults: National Health Interview Survey, 2001. , 2004, Vital and health statistics. Series 10, Data from the National Health Survey.

[26]  Madhav V. Marathe,et al.  Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions , 2014, SDM.

[27]  Claire Cardie,et al.  Early Stage Influenza Detection from Twitter , 2013, ArXiv.

[28]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[29]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[30]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[31]  Alicia Karspeck,et al.  Real-Time Influenza Forecasts during the 2012–2013 Season , 2013, Nature Communications.