Issues in applied statistics for public health bioterrorism surveillance using multiple data streams: research needs

The objective of this report is to provide a basis to inform decisions about priorities for developing statistical research initiatives in the field of public health surveillance for emerging threats. Rapid information system advances have created a vast opportunity of secondary data sources for information to enhance the situational and health status awareness of populations. While the field of medical informatics and initiatives to standardize healthcare-seeking encounter records continue accelerating, it is necessary to adapt analytic and statistical methodologies to mature in sync with sibling information science technologies. One major right-of-passage for statistical inference is to advance the optimal application of analytic methodologies for using multiple data streams in detecting and characterizing public health population events of importance. This report first describes the problem in general and the data context, then delineates more specifically the practical nature of the problem and the related issues. Approaches currently applied to data with time-series, statistical process control and traditional inference concepts are described with examples in the section on Statistics and the Role of the Analytic Surveillance Data Monitor. These are the techniques that are providing substance to surveillance professionals and enabling use of multiple data streams. The next section describes use of a more complex approach that takes temporal as well as spatial dimensions into consideration for detection and situational awareness regarding event distributions. The space-time statistic has successfully been used to detect and track public health events of interest. Important research questions which are summarized at the end of this report are described in more detail with respect to the methodological application in the respective sections. This was thought to help elucidate the research requirements as summarized later in the report. Following the description of the space-time scan statistical application; this report extends to a less traditional area of promise given what has been observed in recent application of analytic methods. Bayesian networks (BNs) represent a conceptual step with advantages of flexibility for the public health surveillance community. Progression from traditional to the more extending statistical concepts in the context of the dynamic status quo of responsibility and challenge, leads to a conclusion consisting of categorical research needs. The report is structured by design to inform judgment about how to build on practical systems to achieve better analytic outcomes for public health surveillance. There are references to research issues throughout the sections with a summarization at the end, which also includes items previously unmentioned in the report.

[1]  S. Lohr Statistics (2nd Ed.) , 1994 .

[2]  Simonne Almeida e Silva,et al.  Population-based surveillance of pediatric pneumonia: use of spatial analysis in an urban area of Central Brazil. , 2004, Cadernos de saude publica.

[3]  H. Burkom Biosurveillance applying scan statistics with multiple, disparate data sources , 2003, Journal of Urban Health.

[4]  Andrew B. Lawson,et al.  Spatial and syndromic surveillance for public health , 2005 .

[5]  David Madigan,et al.  Bayesian Data Mining for Health Surveillance , 2005 .

[6]  Eamonn Mullins,et al.  Probability and Statistics. 2nd edn. , 1988 .

[7]  Andrew W. Moore,et al.  A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters , 2003, NIPS.

[8]  M. Kulldorff,et al.  An elliptic spatial scan statistic , 2006, Statistics in medicine.

[9]  M. Kulldorff,et al.  Dead Bird Clusters as an Early Warning System for West Nile Virus Activity , 2003, Emerging infectious diseases.

[10]  S. Martin,et al.  Investigation of clusters of giardiasis using GIS and a spatial scan statistic , 2004, International journal of health geographics.

[11]  Charles W. Champ,et al.  A multivariate exponentially weighted moving average control chart , 1992 .

[12]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[13]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[14]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[15]  J. Naus Clustering of random points in two dimensions , 1965 .

[16]  Christopher J. Miller,et al.  Controlling the False-Discovery Rate in Astrophysical Data Analysis , 2001, astro-ph/0107034.

[17]  T. Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Spatial Scan Statistic for Detecting Clusters , 2005 .

[18]  Nicky Best,et al.  Statistical issues in the prospective monitoring of health outcomes across multiple units , 2004 .

[19]  D. Hawkins Multivariate quality control based on regression-adjusted variables , 1991 .

[20]  P. Bauer,et al.  Evaluation of experiments with adaptive interim analyses. , 1994, Biometrics.

[21]  Usa Prevention,et al.  Rapid health response, assessment, and surveillance after a tsunami--Thailand, 2004-2005. , 2005, MMWR. Morbidity and mortality weekly report.

[22]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[23]  J. Casani,et al.  The National Capitol Region’s Emergency Department Syndromic Surveillance System: 
Do Chief Complaint and Discharge Diagnosis Yield Different Results? , 2003, Emerging infectious diseases.

[24]  S. Sarkar,et al.  The Simes Method for Multiple Hypothesis Testing with Positively Dependent Test Statistics , 1997 .

[25]  Renato Assunção,et al.  A Simulated Annealing Strategy for the Detection of Arbitrarily Shaped Spatial Clusters , 2022 .

[26]  R. Platt,et al.  Using automated medical records for rapid identification of illness syndromes (syndromic surveillance): the example of lower respiratory infection , 2001, BMC public health.

[27]  Weng-Keen Wong,et al.  Bayesian Biosurveillance of Disease Outbreaks , 2004, UAI.

[28]  M. Kulldorff A spatial scan statistic , 1997 .

[29]  Minitab Statistical Methods for Quality Improvement , 2001 .

[30]  E. Edgington A Normal Curve Method for Combining Probability Values from Independent Experiments , 1972 .

[31]  M A Weinstock,et al.  A generalised scan statistic test for the detection of clusters. , 1981, International journal of epidemiology.

[32]  Jeff Aramini,et al.  Syndromic Surveillance of Gastrointestinal Illness Using Pharmacy Over-the-Counter Sales , 2004, Canadian journal of public health = Revue canadienne de sante publique.

[33]  Martin Kulldorff,et al.  A Space-Time Permutation Scan Statistic for the Early Detection of Disease Outbreaks , 2009 .

[34]  Barbara A. Israel,et al.  Commentary: Model of community health governance: Applicability to community-Based participatory research partnerships , 2003, Journal of Urban Health.

[35]  Joseph S Lombardo,et al.  ESSENCE II and the framework for evaluating syndromic surveillance systems. , 2004, MMWR supplements.

[36]  R. Heffernan,et al.  Three years of emergency department gastrointestinal syndromic surveillance in New York City: what have we found? , 2005, MMWR supplements.

[37]  Peter A Rogerson,et al.  Monitoring change in spatial patterns of disease: comparing univariate and multivariate cumulative sum approaches , 2004, Statistics in medicine.

[38]  Michael J Beach,et al.  Spatial clustering of filarial transmission before and after a Mass Drug Administration in a setting of low infection prevalence , 2004, Filaria journal.

[39]  Kathy J Hurt-Mullen,et al.  Syndromic surveillance on the epidemiologist's desktop: making sense of much data. , 2005, MMWR supplements.

[40]  P E SARTWELL,et al.  The distribution of incubation periods of infectious disease. , 1950, American journal of hygiene.

[41]  Eugene S. Edgington,et al.  An Additive Method for Combining Probability Values from Independent Experiments , 1972 .

[42]  H Rolka,et al.  Deciphering data anomalies in BioSense. , 2005, MMWR supplements.

[43]  Carl G. Bruner,et al.  Commentary: Professional culture change as a condition for effective collaborative problem solving , 2003, Journal of Urban Health.

[44]  Holly Jacobson,et al.  Evaluating the disparity of female breast cancer mortality among racial groups - a spatiotemporal analysis , 2004, International journal of health geographics.

[45]  Martin Kulldorff,et al.  Prospective time periodic geographical disease surveillance using a scan statistic , 2001 .

[46]  N. H. Timm Applied Multivariate Analysis , 2002 .

[47]  S Wallenstein,et al.  A test for detection of clustering over time. , 1980, American journal of epidemiology.

[48]  M. Kulldorff,et al.  Syndromic surveillance in public health practice, New York City. , 2004, Emerging infectious diseases.

[49]  Colleen A Bradley,et al.  BioSense: implementation of a National Early Event Detection and Situational Awareness System. , 2005, MMWR supplements.

[50]  Peter Byass,et al.  Helping northern Ethiopian communities reduce childhood mortality: population-based intervention trial. , 2005, Bulletin of the World Health Organization.

[51]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[52]  G. D. Williamson,et al.  A monitoring system for detecting aberrations in public health surveillance reports. , 1999, Statistics in medicine.

[53]  Galit Shmueli,et al.  Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[54]  M. Kulldorff,et al.  A Tree‐Based Scan Statistic for Database Disease Surveillance , 2003, Biometrics.

[55]  J Coberly,et al.  Public health monitoring tools for multiple data streams. , 2005, MMWR supplements.

[56]  Farzad Mostashari,et al.  Clinical evaluation of the Emergency Medical Services (EMS) ambulance dispatch-based syndromic surveillance system, New York City , 2003, Journal of Urban Health.

[57]  M. Dwass Modified Randomization Tests for Nonparametric Hypotheses , 1957 .

[58]  S. Magruder,et al.  Progress in understanding and using over-the-counter pharmaceuticals for syndromic surveillance. , 2004, MMWR supplements.

[59]  Martin Kulldorff,et al.  Geographical clustering of prostate cancer grade and stage at diagnosis, before and after adjustment for risk factors , 2005, International Journal of Health Geographics.

[60]  George C. Runger,et al.  Comparison of multivariate CUSUM charts , 1990 .

[61]  M. Kulldorff,et al.  Multivariate scan statistics for disease surveillance , 2007, Statistics in medicine.

[62]  Ganapati P. Patil,et al.  Geographic and Network Surveillance via Scan Statistics for Critical Area Detection , 2003 .

[63]  Marianne Frisén,et al.  Statistical Surveillance. Optimality and Methods , 2003 .

[64]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[65]  Kevin Murphy,et al.  A brief introduction to graphical models and bayesian networks , 1998 .

[66]  Galit Shmueli,et al.  Statistical issues and challenges associated with rapid detection of bio‐terrorist attacks , 2005, Statistics in medicine.

[67]  R. Crosier Multivariate generalizations of cumulative sum quality-control schemes , 1988 .

[68]  J C Benneyan,et al.  Statistical Quality Control Methods in Infection Control and Hospital Epidemiology, Part I Introduction and Basic Theory , 1998, Infection Control & Hospital Epidemiology.