Distance-based outliers method for detecting disease outbreaks using social media

Forecasting the disease outbreaks could be useful for decision-making of public health resources. Social media provides a low-cost alternative source for public health surveillance. In this research we use Twitter data as a demonstration to detect influenza outbreak. We use distance-based outliers method to transform the noisy Twitter data into regions and then use regions to do region-based hypothesis testing for rapid outbreak detection. Majority voting has been used for decision making in committees. Our simulations show a good accuracy and robustness.

[1]  Marwan Bikdash,et al.  Hybrid classification for tweets related to infection with influenza , 2015, SoutheastCon 2015.

[2]  Charles W. Chase,et al.  Demand-Driven Forecasting: A Structured Approach to Forecasting , 2009 .

[3]  Jinsheng Xu,et al.  A Real -Time Interactive Visualization System for DNS Amplification Attack Challenges , 2008, Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008).

[4]  Richi Nayak,et al.  A semi-supervised graph-based algorithm for detecting outliers in online-social-networks , 2013, SAC '13.

[5]  Andrew W. Moore,et al.  Bayesian Network Anomaly Pattern Detection for Disease Outbreaks , 2003, ICML.

[6]  Niels Skovgaard,et al.  Foodborne Disease Outbreaks, Guidelines for investigation and control , 2009 .

[7]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[8]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[9]  Richard K. Kiang,et al.  Modeling and Predicting Seasonal Influenza Transmission in Warm Regions Using Climatological Parameters , 2010, PloS one.

[10]  Influenza fact sheet. , 2003, Releve epidemiologique hebdomadaire.

[11]  Andrew W. Moore,et al.  Algorithms for rapid outbreak detection: a research synthesis , 2005, J. Biomed. Informatics.

[12]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[13]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[14]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[15]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[16]  Xiaohong Yuan,et al.  A visualization analysis tool for DNS amplification attack , 2010, 2010 3rd International Conference on Biomedical Engineering and Informatics.

[17]  Roman Kuc,et al.  Introduction to Digital Signal Processing , 1988 .

[18]  Rob J Hyndman,et al.  Minimum Sample Size requirements for Seasonal Forecasting Models , 2007 .

[19]  Azuraliza Abu Bakar,et al.  Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance , 2013 .

[20]  Deepak K. Agarwal,et al.  An empirical Bayes approach to detect anomalies in dynamic multidimensional arrays , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[22]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[23]  J. Wallinga,et al.  Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures , 2004, American journal of epidemiology.

[24]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers , 2002, Intell. Data Anal..

[25]  S. Blount,et al.  Lead Visual Information Specialist , 2003 .

[26]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.