A method for imputing missing data in longitudinal studies.

PURPOSE In a cohort in which racial data are unknown for some persons, race-specific persons and person-years are imputed using a model-based iterative allocation algorithm (IAA). METHODS An EM algorithm-based approach to address misclassification in a censored data regression setting can be adapted to estimate the probability that a person of unknown race is white. The corresponding race-specific person-years are obtained as a by-product of the estimation procedure. Variance estimates are computed using the bootstrap. The proposed approach is compared with the proportional allocation method (PAM). RESULTS In an occupational cohort where racial data were missing for 41% of the workers, the age-time-race-specific person-years were estimated within a relative variation of approximately 20%, using the IAA. The deaths were less reliably estimated. The standardized mortality ratios (SMRs) for all-cause mortality estimated using the IAA and the PAM were more similar for the non-white workers than for a smaller subgroup of white workers. CONCLUSIONS The IAA provides a method to reliably estimate race-specific person-year denominators in cohort studies with missing racial data. This method is applicable to other incompletely observed non-time-dependent categorical covariates. Internal cohort rates or SMRs can be computed and modeled, with bootstrap confidence intervals that account for the uncertainty in the determination of race.

[1]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[2]  Stuart G. Baker,et al.  A Simple Method for Computing the Observed Information Matrix When Using the EM Algorithm with Categorical Data , 1992 .

[3]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[4]  G. Marsh,et al.  Mortality update of a cohort of U.S. man-made mineral fibre workers. , 1987, The Annals of occupational hygiene.

[5]  S G Baker Regression analysis of grouped survival data with incomplete covariates: nonignorable missing-data and censoring mechanisms. , 1994, Biometrics.

[6]  S. Baker Composite linear models for incomplete multinomial data. , 1994, Statistics in medicine.

[7]  R. G. Cornell,et al.  Modern Statistical Methods in Chronic Disease Epidemiology. , 1988 .

[8]  B. Macmahon The National Death Index. , 1983, American journal of public health.

[9]  D. Rubin,et al.  Missing data, imputation, and the bootstrap. Comment , 1994 .

[10]  M. Feinleib National Center for Health Statistics (NCHS) , 2005 .

[11]  N. E. Breslow Statistical Methods in Cancer Research , 1986 .

[12]  R. Stone,et al.  OCMAP-PLUS: a program for the comprehensive analysis of occupational cohort data. , 1998, Journal of occupational and environmental medicine.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  A S Whittemore,et al.  Censored survival data with misclassified covariates: a case study of breast-cancer mortality. , 1990, Journal of the American Statistical Association.

[15]  R. Stone,et al.  Mortality among a cohort of US man-made mineral fiber workers: 1985 follow-up. , 1990, Journal of occupational medicine. : official publication of the Industrial Medical Association.

[16]  P E Enterline,et al.  Assigning race to occupational cohorts using census block statistics. , 1990, American journal of epidemiology.

[17]  Bradley Efron,et al.  Missing Data, Imputation, and the Bootstrap , 1994 .

[18]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[19]  Joseph G. Ibrahim,et al.  A conditional model for incomplete covariates in parametric regression models , 1996 .

[20]  Kirby L. Jackson,et al.  Log-linear analysis of censored survival data with partially observed covariates , 1989 .

[21]  N A Esmen,et al.  Respiratory disease among workers exposed to man-made mineral fibers. , 2015, The American review of respiratory disease.