Monitoring events with application to syndromic surveillance using social media data

Availability of time series data in different domains has resulted in approaches for outbreak detection. A popular alternative to detect outbreaks is to use daily counts of events. However, time between events (TBE) has proven to be a useful alternative, especially in the case of sudden, unexpected events. Past work that uses TBE for monitoring events assumes that the in‐control number of events is up to 10 per day. In this article, we derive robust monitoring plans that are scalable when the in‐control counts are higher than 10 per day but less than 100 per counting period (eg, day). TBE values are generally nonhomogeneous across days and within days. This makes the volume of data to train the technology a challenge, and this challenge increases the volume of data needed to design the charts. This article discusses these challenges and suggests solutions for data that are known to be Weibull‐distributed. We present our results in two parts. The first is a simulated dataset that controls parameters of the plan such as the daily counts of events. We then show how the monitoring plans can be applied to the detection of syndromes (ie, disease outbreaks) using social media data.

[1]  Rudolf G. Kittlitz TRANSFORMING THE EXPONENTIAL FOR SPC APPLICATIONS , 1999 .

[2]  Kaarina Kauhala,et al.  The interactions of predator and hare populations in Finland — a study based on wildlife monitoring counts , 2000 .

[3]  J C Benneyan,et al.  Number-Between g-Type Statistical Quality Control Charts for Monitoring Adverse Events , 2001, Health care management science.

[4]  Charles W. Champ,et al.  Phase I control charts for times between events , 2002 .

[5]  Douglas C. Montgomery,et al.  Process monitoring for multiple count data using generalized linear model-based control charts , 2003 .

[6]  M. Xie,et al.  Time-between-events charts for on-line process monitoring , 2004, 2004 IEEE International Engineering Management Conference (IEEE Cat. No.04CH37574).

[7]  Farrokh Alemi,et al.  Time-between control charts for monitoring asthma attacks. , 2004, Joint Commission journal on quality and safety.

[8]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[9]  George M. Mohay,et al.  The use of packet inter-arrival times for investigating unsolicited Internet traffic , 2005, First International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE'05).

[10]  T. N. Goh,et al.  Monitoring Inter-Arrival Times with Statistical Control Charts , 2006 .

[11]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[12]  G. R. Wiggans,et al.  Monitoring goat and sheep milk somatic cell counts , 2007 .

[13]  Christian H. Weiß,et al.  Controlling correlated processes of Poisson counts , 2007, Qual. Reliab. Eng. Int..

[14]  Christian H. Weiß,et al.  EWMA Monitoring of Correlated Processes of Poisson Counts , 2009 .

[15]  Ross Sparks,et al.  Improving EWMA Plans for Detecting Unusual Increases in Poisson Counts , 2009, Adv. Decis. Sci..

[16]  Murat Caner Testik,et al.  CUSUM Monitoring of First-Order Integer-Valued Autoregressive Processes of Poisson Counts , 2009 .

[17]  Min Xie,et al.  A study of time-between-events control chart for the monitoring of regularly maintained systems , 2009, Qual. Reliab. Eng. Int..

[18]  Ross Sparks,et al.  Early warning CUSUM plans for surveillance of negative binomial daily disease counts , 2010 .

[19]  Ross Sparks,et al.  Exponentially weighted moving average plans for detecting unusual negative binomial counts , 2010 .

[20]  M. Shamsuzzaman,et al.  A combined control scheme for monitoring the frequency and size of an attribute event , 2010 .

[21]  Ross Sparks,et al.  Optimal exponentially weighted moving average (EWMA) plans for detecting seasonal epidemics when faced with non-homogeneous negative binomial counts , 2011 .

[22]  Wei Jiang,et al.  Weighted CUSUM Control Charts for Monitoring Poisson Processes with Varying Sample Sizes , 2011 .

[23]  Thong Ngee Goh,et al.  Two MEWMA Charts for Gumbel's Bivariate Exponential Distribution , 2011 .

[24]  Wei Jiang,et al.  Likelihood-Based EWMA Charts for Monitoring Poisson Count Data With Time-Varying Sample Sizes , 2012 .

[25]  Christian H. Weiß,et al.  A Two-Sided Cumulative Sum Chart for First-Order Integer-Valued Autoregressive Processes of Poisson Counts , 2013, Qual. Reliab. Eng. Int..

[26]  Philippe Castagliola,et al.  A CUSUM scheme for event monitoring , 2013 .

[27]  Eralp Dogu,et al.  Change Point Estimation Based Statistical Monitoring with Variable Time Between Events (TBE) Control Charts , 2014 .

[28]  Abdur Rahim,et al.  Time-Between-Event Control Charts for Sampling Inspection , 2014, Technometrics.

[29]  Haydar Demirhan,et al.  Bayesian X̄ control limits for exponentially distributed measurements , 2014 .

[30]  Jung-Tai Chen,et al.  A Shewhart‐type Control Scheme to Monitor Weibull Data without Subgrouping , 2014, Qual. Reliab. Eng. Int..

[31]  Jaime A. Camelio,et al.  Cumulative Sum Control Charts for Monitoring Weibull-distributed Time Between Events , 2015, Qual. Reliab. Eng. Int..

[32]  Robert Power,et al.  Social Media Monitoring for Health Indicators , 2015 .

[33]  Min Xie,et al.  Design of exponential control charts based on average time to signal using a sequential sampling scheme , 2015 .

[34]  Xiaofeng Zhao,et al.  A statistical control chart for monitoring customer waiting time , 2015, Int. J. Data Anal. Tech. Strateg..

[35]  Min Xie,et al.  Design of Gamma Charts Based on Average Time to Signal , 2016, Qual. Reliab. Eng. Int..

[36]  Subhabrata Chakraborti,et al.  Phase II Shewhart‐type Control Charts for Monitoring Times Between Events and Effects of Parameter Estimation , 2016, Qual. Reliab. Eng. Int..

[37]  Min Xie,et al.  Simultaneously monitoring frequency and magnitude of events based on bivariate gamma distribution , 2017 .

[38]  Robert Power,et al.  An investigation into social media syndromic monitoring , 2017, Commun. Stat. Simul. Comput..

[39]  Gillian Z. Heller,et al.  Distributions for Modeling Location, Scale, and Shape , 2019 .

[40]  Cecile Paris,et al.  Real-time monitoring of events applied to syndromic surveillance , 2019 .