Modeling Reporting Delays and Reporting Corrections in Cancer Registry Data

The Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute is an authoritative source of cancer incidence statistics in the United States. The SEER program is a consortium of population-based cancer registries from different areas of the country. Each registry is charged with collecting data on all cancers that occur within its geographic area. As with any disease registry, there is a delay between the time that the disease (cancer) is first diagnosed and the time that it is reported to the registry. The SEER program has allowed for reporting delays of up to 19-months before releasing data for public use. Nevertheless, additional cases are discovered after the 19-month delay, and these cases are added in subsequent releases of the data. Further, any errors discovered are corrected in subsequent releases. Such reporting delays and corrections typically lead to underestimation of the cancer incidence rates in recent diagnosis years, making it difficult to monitor trends. In this article we study models that account for reporting delays and corrections in predicting eventual cancer counts for a diagnosis year from the preliminary counts. Previous models of this type have been studied, especially as applied to AIDS registries. We offer several additions to existing models. First, we explicitly model the reporting corrections. Second, we model the delay distribution with very general models, combining aspects of previous nonparametric-like models (i.e., models that have a separate parameter for each delay time) with more parametric models. Third, we allow random reporting-year effects in the model. Practical issues of model selection and how the data are classified are also discussed, particularly how the definition of a reporting correction may change depending on how subpopulations are defined. An example with SEER melanoma data is studied in detail.

[1]  P. Redondo,et al.  Patología esofágica por reflujo: mecanismos etiopatogénicos, manifestaciones clínicas, complicaciones y criterios diagnósticos , 2000 .

[2]  A. Jemal,et al.  Annual report to the nation on the status of cancer, 1975–2001, with a special feature regarding survival , 2004, Cancer.

[3]  John D. Kalbfleisch,et al.  Methods for the analysis and predic tion of warranty claims , 1991 .

[4]  Niels Keiding,et al.  Random truncation models and Markov processes , 1990 .

[5]  T. Herbst An application of randomly truncated data models in reserving IBNR claims , 1999 .

[6]  J. F. Lawless,et al.  Adjustments for reporting delays and the prediction of occurred but not reported events , 1994 .

[7]  L. Kessler,et al.  FALLING RATES OF LUNG CANCER IN MEN IN THE UNITED STATES , 1986, The Lancet.

[8]  M. Peruggia Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.) , 2003 .

[9]  Richard Verrall,et al.  An investigation into stochastic claims reserving models and the chain-ladder technique , 2000 .

[10]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[11]  X M Tu,et al.  Regression analysis of censored and truncated data: estimating reporting-delay distributions and AIDS incidence from surveillance data. , 1994, Biometrics.

[12]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[13]  Jerald F. Lawless,et al.  Inference Based on Retrospective Ascertainment: An Analysis of the Data on Transfusion-Related AIDS , 1989 .

[14]  L. Doray UMVUE of the IBNR reserve in a lognormal linear regression model , 1996 .

[15]  R Brookmeyer,et al.  Statistical methods for short-term projections of AIDS incidence. , 1989, Statistics in medicine.

[16]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[17]  Jeffrey E. Harris,et al.  Reporting Delays and the Incidence of AIDS , 1990 .

[18]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[19]  Deborah Schrag,et al.  Annual report to the nation on the status of cancer, 1975-2002, featuring population-based trends in cancer treatment. , 2005, Journal of the National Cancer Institute.

[20]  D. Cox,et al.  A process of events with notification delay and the forecasting of AIDS. , 1989, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[21]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[22]  D Schillaci,et al.  The tempting business of smart drugs: A concrete health risk , 2005 .

[23]  E. Feuer,et al.  Impact of reporting delay and reporting error on cancer incidence rates and trends. , 2002, Journal of the National Cancer Institute.

[24]  D. Bates,et al.  Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model , 1995 .

[25]  N. Laird Nonparametric Maximum Likelihood Estimation of a Mixing Distribution , 1978 .

[26]  M. McMillen,et al.  Representativeness of the surveillance, epidemiology, and end results program data: recent trends in cancer mortality rates. , 1992, Journal of the National Cancer Institute.