Detecting and Analyzing Influenza Epidemics with Social Media in China

In recent years, social media has become important and omnipresent for social network and information sharing. Researchers and scientists have begun to mine social media data to predict varieties of social, economic, health and entertainment related real-world phenomena. In this paper, we exhibit how social media data can be used to detect and analyze real-world phenomena with several data mining techniques. Specifically, we use posts from TencentWeibo to detect influenza and analyze influenza trends. We build a support vector machine (SVM) based classifier to classify influenza posts. In addition, we use association rule mining to extract strongly associated features as additional features of posts to overcome the limitation of 140 words for posts. We also use sentimental analysis to classify the reposts without feature and uncommented reposts. The experimental results show that by combining those techniques, we can improve the precision and recall by at least ten percent. Finally, we analyze the spatial and temporal patterns for positive influenza posts and tell when and where influenza epidemic is more likely to occur.

[1]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[2]  S. Magruder Evaluation of Over-the-Counter Pharmaceutical Sales As a Possible Early Warning Indicator of Human Disease , 2003 .

[3]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[4]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[7]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[8]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.

[9]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[10]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[11]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[12]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[13]  Henry A. Kautz,et al.  Predicting Disease Transmission from Geo-Tagged Micro-Blog Data , 2012, AAAI.

[14]  Michael M. Wagner,et al.  Telephone Triage: A Timely Data Source for Surveillance of Influenza-like Diseases , 2003, AMIA.

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  Chih-Jen Lin,et al.  A Guide to Support Vector Machines , 2006 .

[17]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[18]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[19]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[20]  D G Krige,et al.  A statistical approach to some mine valuation and allied problems on the Witwatersrand , 2015 .

[21]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[22]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .