A Framework for Public Health Surveillance

With the rapid growth of social media, there is increasing potential to augment traditional public health surveillance methods with data from social media. We describe a framework for performing public health surveillance on Twitter data. Our framework, which is publicly available, consists of three components that work together to detect health-related trends in social media: a concept extraction component for identifying health-related concepts, a concept aggregation component for identifying how the extracted health-related concepts relate to each other, and a trend detection component for determining when the aggregated health-related concepts are trending. We describe the architecture of the framework and several components that have been implemented in the framework, identify other components that could be used with the framework, and evaluate our framework on approximately 1.5 years of tweets. While it is difficult to determine how accurately a Twitter trend reflects a trend in the real world, we discuss the differences in trends detected by several different methods and compare flu trends detected by our framework to data from Google Flu Trends.

[1]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[2]  Padhraic Smyth,et al.  Adaptive event detection with time-varying poisson processes , 2006, KDD '06.

[3]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[4]  Xiaohua Hu,et al.  MaxMatcher: Biological Concept Extraction Using Approximate Dictionary Lookup , 2006, PRICAI.

[5]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[6]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[7]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[8]  Valentin I. Spitkovsky,et al.  A Cross-Lingual Dictionary for English Wikipedia Concepts , 2012, LREC.

[9]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[10]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[11]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[12]  Diane J. Cook,et al.  Monitoring Influenza Trends through Mining Social Media , 2009, BIOCOMP.

[13]  Nigel Collier,et al.  Synonym set extraction from the biomedical literature by lexical pattern discovery , 2007, BMC Bioinformatics.

[14]  Avare Stewart,et al.  Unsupervised public health event detection for epidemic intelligence , 2010, CIKM.

[15]  Dan Roth,et al.  Relational Inference for Wikification , 2013, EMNLP.

[16]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[17]  J. Brownstein,et al.  Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project , 2008, PLoS medicine.

[18]  Ophir Frieder,et al.  A framework for detecting public health trends with Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[19]  Jian Yang,et al.  Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks , 2010, BioNLP@ACL.

[20]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[21]  Nello Cristianini,et al.  Nowcasting Events from the Social Web with Statistical Learning , 2012, TIST.

[22]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[23]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[24]  Ophir Frieder,et al.  Relevance-Ranked Domain-Specific Synonym Discovery , 2014, ECIR.

[25]  Nazli Goharian,et al.  ADRTrace: Detecting Expected and Unexpected Adverse Drug Reactions from User Reviews on Social Media Sites , 2013, ECIR.