Text and Structural Data Mining of Influenza Mentions in Web and Social Media

Text and structural data mining of web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5 October 2008 to 21 March 2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like illness patient report data. We also bring to bear a graph-based data mining technique to detect anomalies among flu blogs connected by publisher type, links, and user-tags.

[1]  P. F. Adams,et al.  Summary health statistics for the U.S. population: National Health Interview Survey, 2009. , 2010, Vital and health statistics. Series 10, Data from the National Health Survey.

[2]  Richard Platt,et al.  Telephone Triage Service Data for Detection of Influenza-Like Illness , 2009, PloS one.

[3]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[4]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[5]  Diane J. Cook,et al.  Monitoring Influenza Trends through Mining Social Media , 2009, BIOCOMP.

[6]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[7]  Rada Mihalcea,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[8]  HolderLawrence,et al.  Anomaly detection in data represented as graphs , 2007 .

[9]  Lawrence B. Holder,et al.  Anomaly detection in data represented as graphs , 2007, Intell. Data Anal..

[10]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[11]  Wendy W. Chapman,et al.  Analysis of Web Access Logs for Surveillance of Influenza , 2004, MedInfo.

[12]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[13]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  L. Amaral,et al.  The web of human sexual contacts , 2001, Nature.

[15]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[16]  Fred L. Drake,et al.  The Python Language Reference Manual , 1999 .

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[19]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[20]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.