Automated time series forecasting for biosurveillance

For robust detection performance, traditional control chart monitoring for biosurveillance is based on input data free of trends, day-of-week effects, and other systematic behaviour. Time series forecasting methods may be used to remove this behaviour by subtracting forecasts from observations to form residuals for algorithmic input. We describe three forecast methods and compare their predictive accuracy on each of 16 authentic syndromic data streams. The methods are (1) a non-adaptive regression model using a long historical baseline, (2) an adaptive regression model with a shorter, sliding baseline, and (3) the Holt-Winters method for generalized exponential smoothing. Criteria for comparing the forecasts were the root-mean-square error, the median absolute per cent error (MedAPE), and the median absolute deviation. The median-based criteria showed best overall performance for the Holt-Winters method. The MedAPE measures over the 16 test series averaged 16.5, 11.6, and 9.7 for the non-adaptive regression, adaptive regression, and Holt-Winters methods, respectively. The non-adaptive regression forecasts were degraded by changes in the data behaviour in the fixed baseline period used to compute model coefficients. The mean-based criterion was less conclusive because of the effects of poor forecasts on a small number of calendar holidays. The Holt-Winters method was also most effective at removing serial autocorrelation, with most 1-day-lag autocorrelation coefficients below 0.15. The forecast methods were compared without tuning them to the behaviour of individual series. We achieved improved predictions with such tuning of the Holt-Winters method, but practical use of such improvements for routine surveillance will require reliable data classification methods.

[1]  J. Pavlin,et al.  Bio-ALIRT biosurveillance detection algorithm evaluation. , 2004, MMWR supplements.

[2]  Chris Chatfield,et al.  The Holt-Winters Forecasting Procedure , 1978 .

[3]  H. Burkom,et al.  Role of data aggregation in biosurveillance detection strategies with applications from ESSENCE. , 2004, MMWR supplements.

[4]  Chris Chatfield,et al.  Holt‐Winters Forecasting: Some Practical Issues , 1988 .

[5]  Stephen E. Fienberg,et al.  Current and Potential Statistical Methods for Monitoring Multiple Data Streams for Biosurveillance , 2006 .

[6]  H. Burkom Development, adaptation, and assessment of alerting algorithms for biosurveillance , 2003 .

[7]  J C Benneyan,et al.  Statistical Quality Control Methods in Infection Control and Hospital Epidemiology, Part II: Chart Use, Statistical Properties, and Research Issues , 1998, Infection Control & Hospital Epidemiology.

[8]  Peter R. Winters,et al.  Forecasting Sales by Exponentially Weighted Moving Averages , 1960 .

[9]  R. Meyer,et al.  The Fundamental Theorem of Exponential Smoothing , 1961 .

[10]  B. J. Mandel The Regression Control Chart , 1969 .

[11]  Joseph S. Lombardo,et al.  A systems overview of the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE II) , 2003, Journal of Urban Health.

[12]  R. Platt,et al.  A generalized linear mixed models approach for detecting incident clusters of disease in small areas, with an application to biological terrorism. , 2004, American journal of epidemiology.

[13]  William H. Woodall,et al.  The Use of Control Charts in Health-Care and Public-Health Surveillance , 2006 .

[14]  J. Marc Overhage,et al.  Research Paper: Detection of Pediatric Respiratory and Diarrheal Outbreaks from Sales of Over-the-counter Electrolyte Products , 2003, J. Am. Medical Informatics Assoc..

[15]  G D Williamson,et al.  A study of the average run length characteristics of the National Notifiable Diseases Surveillance System. , 1999, Statistics in medicine.

[16]  Lars Bergman,et al.  Computer-aided DSM-IV-diagnostics – acceptance, use and perceived usefulness in relation to users' learning styles , 2005, BMC Medical Informatics Decis. Mak..

[17]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[18]  G. D. Williamson,et al.  A monitoring system for detecting aberrations in public health surveillance reports. , 1999, Statistics in medicine.

[19]  J C Benneyan,et al.  Statistical Quality Control Methods in Infection Control and Hospital Epidemiology, Part I Introduction and Basic Theory , 1998, Infection Control & Hospital Epidemiology.

[20]  Matthias Schonlau,et al.  Syndromic Surveillance: Is it Worth the Effort? , 2004 .

[21]  L. Hutwagner,et al.  Using laboratory-based surveillance data for prevention: an algorithm for detecting Salmonella outbreaks. , 1997, Emerging infectious diseases.

[22]  Kenneth D. Mandl,et al.  Time series modeling for syndromic surveillance , 2003, BMC Medical Informatics Decis. Mak..

[23]  Kathy J Hurt-Mullen,et al.  Syndromic surveillance on the epidemiologist's desktop: making sense of much data. , 2005, MMWR supplements.

[24]  Tom Burr,et al.  Modeling emergency department visit patterns for infectious disease complaints: results and application to disease surveillance , 2005, BMC Medical Informatics Decis. Mak..

[25]  Minitab Statistical Methods for Quality Improvement , 2001 .

[26]  Danny Pfeffermann,et al.  Multivariate exponential smoothing: Method and practice , 1989 .