Using the whole cohort in the analysis of case-cohort data.

Case-cohort data analyses often ignore valuable information on cohort members not sampled as cases or controls. The Atherosclerosis Risk in Communities (ARIC) study investigators, for example, typically report data for just the 10%-15% of subjects sampled for substudies of their cohort of 15,972 participants. Remaining subjects contribute to stratified sampling weights only. Analysis methods implemented in the freely available R statistical system (http://cran.r-project.org/) make better use of the data through adjustment of the sampling weights via calibration or estimation. By reanalyzing data from an ARIC study of coronary heart disease and simulations based on data from the National Wilms Tumor Study, the authors demonstrate that such adjustment can dramatically improve the precision of hazard ratios estimated for baseline covariates known for all subjects. Adjustment can also improve precision for partially missing covariates, those known for substudy participants only, when their values may be imputed with reasonable accuracy for the remaining cohort members. Links are provided to software, data sets, and tutorials showing in detail the steps needed to carry out the adjusted analyses. Epidemiologists are encouraged to consider use of these methods to enhance the accuracy of results reported from case-cohort analyses.

[1]  Bryan Langholz,et al.  Exposure Stratified Case-Cohort Designs , 2000, Lifetime data analysis.

[2]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[3]  F. D. K. Liddell,et al.  Methods of Cohort Analysis : Appraisal by Application to Asbestos Mining , 1977 .

[4]  Michal Kulich,et al.  Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies , 2004 .

[5]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[6]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[7]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[8]  Carl-Erik Särndal,et al.  Generalized Raking Procedures in Survey Sampling , 1993 .

[9]  References , 1971 .

[10]  Heejung Bang,et al.  Lipoprotein-Associated Phospholipase A2, High-Sensitivity C-Reactive Protein, and Risk for Incident Coronary Heart Disease in Middle-Aged Men and Women in the Atherosclerosis Risk in Communities (ARIC) Study , 2004, Circulation.

[11]  J H Eckfeldt,et al.  A prospective study of coronary heart disease and the hemochromatosis gene (HFE) C282Y mutation: the Atherosclerosis Risk in Communities (ARIC) study. , 2001, Atherosclerosis.

[12]  Hormuzd A. Katki,et al.  Specifying and Implementing Nonparametric and Semiparametric Survival Estimators in Two-Stage (Nested) Cohort Studies With Missing Case Data , 2006 .

[13]  L. J. Wei,et al.  The Robust Inference for the Cox Proportional Hazards Model , 1989 .

[14]  Lawrence L. Kupper,et al.  A Hybrid Epidemiologic Study Design Useful in Estimating Relative Risk , 1975 .

[15]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[16]  W. Barlow,et al.  Robust variance estimation for the case-cohort design. , 1994, Biometrics.

[17]  E. Boerwinkle,et al.  The variable number of tandem repeat polymorphism of platelet glycoprotein Ibalpha and risk of coronary heart disease. , 2004, Blood.

[18]  N E Breslow,et al.  Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms' tumor: a report from the National Wilms' Tumor Study Group. , 1998, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[19]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[20]  Thomas Lumley,et al.  Analysis of Complex Survey Samples , 2004 .

[21]  O. Miettinen,et al.  Design options in epidemiologic research. An update. , 1982, Scandinavian journal of work, environment & health.

[22]  J. Tukey,et al.  AVERAGE VALUES OF MEAN SQUARES IN FACTORIALS , 1956 .

[23]  H. Katki,et al.  Influence Function Based Variance Estimation and Missing Data Issues in Case-Cohort Studies , 2001, Lifetime data analysis.

[24]  Edward Baum,et al.  Treatment of Wilms' tumor. Results of the third national Wilms' tumor study , 1989, Cancer.

[25]  D. Sandler,et al.  Incidence of Leukemia, Lymphoma, and Multiple Myeloma in Czech Uranium Miners: A Case–Cohort Study , 2006, Environmental health perspectives.

[26]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[27]  Diane Catellier,et al.  C-reactive protein and incident coronary heart disease in the Atherosclerosis Risk In Communities (ARIC) study. , 2002, American heart journal.

[28]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[29]  David A. Binder,et al.  Fitting Cox's proportional hazards models from survey data , 1992 .

[30]  J. Pankow,et al.  Glutathione S-transferase genotype as a susceptibility factor in smoking-related coronary heart disease. , 2000, Atherosclerosis.

[31]  W E Barlow,et al.  Analysis of case-cohort designs. , 1999, Journal of clinical epidemiology.

[32]  Thomas Lumley,et al.  Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology , 2009, Statistics in biosciences.

[33]  H. Krumholz Lipoprotein-Associated Phospholipase A , 2001 .