Models for surveillance data under reporting delay: applications to US veteran first-time suicide attempters

Surveillance data provide a vital source of information for assessing the spread of a health problem or disease of interest and for planning for future health-care needs. However, the use of surveillance data requires proper adjustments of the reported caseload due to underreporting caused by reporting delays within a limited observation period. Although methods are available to address this classic statistical problem, they are largely focused on inference for the reporting delay distribution, with inference about caseload of disease incidence based on estimates for the delay distribution. This approach limits the complexity of models for disease incidence to provide reliable estimates and projections of incidence. Also, many of the available methods lack robustness since they require parametric distribution assumptions. We propose a new approach to overcome such limitations by allowing for separate models for the incidence and the reporting delay in a distribution-free fashion, but with joint inference for both modeling components, based on functional response models. In addition, we discuss inference about projections of future disease incidence to help identify significant shifts in temporal trends modeled based on the observed data. This latter issue on detecting ‘change points’ is not sufficiently addressed in the literature, despite the fact that such warning signs of potential outbreak are critically important for prevention purposes. We illustrate the approach with both simulated and real data, with the latter involving data for suicide attempts from the Veteran Healthcare Administration.

[1]  Marcello Pagano,et al.  Pediatric AIDS in New York City: Estimating the Distributions of Infection, Latency, and Reporting Delay and Projecting Future Incidence , 1992 .

[2]  Mitchell H. Gail,et al.  A Method for Obtaining Short-Term Projections and Lower Bounds on the Size of the AIDS Epidemic , 1988 .

[3]  Jeanne Kowalski,et al.  Modern Applied U-Statistics , 2007 .

[4]  R Brookmeyer,et al.  The analysis of delays in disease reporting: methods and results for the acquired immunodeficiency syndrome. , 1990, American journal of epidemiology.

[5]  B. Efron Logistic Regression, Survival Analysis, and the Kaplan-Meier Curve , 1988 .

[6]  X M Tu,et al.  Regression analysis of censored and truncated data: estimating reporting-delay distributions and AIDS incidence from surveillance data. , 1994, Biometrics.

[7]  X. Tu,et al.  Applied Categorical and Count Data Analysis , 2012 .

[8]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[9]  Wan Tang,et al.  Inference for kappas for longitudinal study data: applications to sexual health research. , 2008, Biometrics.

[10]  Tian Chen,et al.  Extending the Mann–Whitney–Wilcoxon rank sum test to longitudinal regression analysis , 2014 .

[11]  X. M. Tu,et al.  Multivariate U‐statistics: a tutorial with applications , 2011 .

[12]  Jeffrey E. Harris,et al.  Reporting Delays and the Incidence of AIDS , 1990 .

[13]  Tian Chen,et al.  Causal inference for community-based multi-layered intervention study. , 2014, Statistics in medicine.

[14]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[15]  Jerald F. Lawless,et al.  Inference Based on Retrospective Ascertainment: An Analysis of the Data on Transfusion-Related AIDS , 1989 .

[16]  Hui Zhang,et al.  On the Impact of Parametric Assumptions and Robust Alternatives for Longitudinal Data Analysis , 2009, Biometrical journal. Biometrische Zeitschrift.

[17]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[18]  X M Tu,et al.  Causal inference for Mann–Whitney–Wilcoxon rank sum and other nonparametric statistics , 2014, Statistics in medicine.

[19]  V. De Gruttola,et al.  Nonparametric analysis of truncated survival data, with application to AIDS , 1988 .

[20]  J. Kowalski,et al.  Nonparametric inference for stochastic linear hypotheses: Application to high‐dimensional data , 2004 .

[21]  P. Bacchetti Estimating the Incubation Period of AIDS by Comparing Population Infection and Diagnosis Patterns , 1990 .

[22]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[23]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[24]  X M Tu,et al.  Distribution‐free models for longitudinal count responses with overdispersion and structural zeros , 2013, Statistics in medicine.

[25]  P Wu,et al.  A Class of Distribution-Free Models for Longitudinal Mediation Analysis , 2014, Psychometrika.