Using full‐cohort data in nested case–control and case–cohort studies by multiple imputation

In many large prospective cohorts, expensive exposure measurements cannot be obtained for all individuals. Exposure-disease association studies are therefore often based on nested case-control or case-cohort studies in which complete information is obtained only for sampled individuals. However, in the full cohort, there may be a large amount of information on cheaply available covariates and possibly a surrogate of the main exposure(s), which typically goes unused. We view the nested case-control or case-cohort study plus the remainder of the cohort as a full-cohort study with missing data. Hence, we propose using multiple imputation (MI) to utilise information in the full cohort when data from the sub-studies are analysed. We use the fully observed data to fit the imputation models. We consider using approximate imputation models and also using rejection sampling to draw imputed values from the true distribution of the missing values given the observed data. Simulation studies show that using MI to utilise full-cohort information in the analysis of nested case-control and case-cohort studies can result in important gains in efficiency, particularly when a surrogate of the main exposure is available in the full cohort. In simulations, this method outperforms counter-matching in nested case-control studies and a weighted analysis for case-cohort studies, both of which use some full-cohort information. Approximate imputation models perform well except when there are interactions or non-linear terms in the outcome model, where imputation using rejection sampling works well.

[1]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[2]  Olli Saarela,et al.  Nested case–control data utilized for multiple outcomes: a likelihood approach and alternatives , 2008, Statistics in medicine.

[3]  F. D. K. Liddell,et al.  Methods of Cohort Analysis : Appraisal by Application to Asbestos Mining , 1977 .

[4]  Paul T. von Hippel,et al.  HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES , 2009 .

[5]  James R Carpenter,et al.  Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model , 2012, Statistical methods in medical research.

[6]  B Langholz,et al.  Counter-matching in studies of gene-environment interaction: efficiency and feasibility. , 2001, American journal of epidemiology.

[7]  B Langholz,et al.  Nested case-control and case-cohort methods of sampling from a cohort: a critical comparison. , 1990, American journal of epidemiology.

[8]  R. Prentice Covariate measurement errors and parameter estimation in a failure time regression model , 1982 .

[9]  Thomas H Scheike,et al.  Maximum likelihood estimation for Cox's regression model under nested case-control sampling. , 2004, Biostatistics.

[10]  Bryan Langholz,et al.  Exposure Stratified Case-Cohort Designs , 2000, Lifetime data analysis.

[11]  Qian Yang,et al.  The value of reusing prior nested case–control data in new studies with different outcome , 2012, Statistics in medicine.

[12]  Thomas Lumley,et al.  Improved Horvitz–Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology , 2009, Statistics in biosciences.

[13]  Z. Ying,et al.  Cox Regression with Incomplete Covariate Measurements , 1993 .

[14]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[15]  Bryan Langholz,et al.  Use of Cohort Information in the Design and Analysis of Case‐Control Studies , 2007 .

[16]  Yi-Hau Chen,et al.  Cox regression in cohort studies with validation sampling , 2001 .

[17]  Norman E. Breslow,et al.  Multiplicative Models and Cohort Analysis , 1983 .

[18]  Bryan Langholz,et al.  Methods for the Analysis of Sampled Cohort Data in the Cox Proportional Hazards Model , 1995 .

[19]  Steven G. Self,et al.  Asymptotic Distribution Theory and Efficiency Results for Case-Cohort Studies , 1988 .

[20]  D Clayton,et al.  Sampling strategies in nested case-control studies. , 1994, Environmental health perspectives.

[21]  C. Land,et al.  Improving the efficiency of nested case-control studies of interaction by selecting controls using counter matching on exposure. , 2004, International journal of epidemiology.

[22]  Wenbin Lu,et al.  Cox Regression in Nested Case–Control Studies with Auxiliary Covariates , 2010, Biometrics.

[23]  Ian R White,et al.  Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods , 2012, BMC Medical Research Methodology.

[24]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[25]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[26]  Michel Chavance,et al.  Multiple imputation analysis of case–cohort studies , 2011, Statistics in medicine.

[27]  R. Little,et al.  Proportional hazards regression with missing covariates , 1999 .

[28]  W. Barlow,et al.  Robust variance estimation for the case-cohort design. , 1994, Biometrics.

[29]  M. Kenward,et al.  A comparison of multiple imputation and doubly robust estimation for analyses with missing data , 2006 .

[30]  Michal Kulich,et al.  Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies , 2004 .

[31]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[32]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[33]  I. White,et al.  Dietary fiber and colorectal cancer risk: a nested case-control study using food diaries. , 2010, Journal of the National Cancer Institute.

[34]  Bryan Langholz,et al.  Counter-matching: A stratified nested case-control sampling method , 1995 .

[35]  D. Cox Regression Models and Life-Tables , 1972 .

[36]  Thomas Lumley,et al.  Using the whole cohort in the analysis of case-cohort data. , 2009, American journal of epidemiology.

[37]  Sven Ove Samuelsen,et al.  A psudolikelihood approach to analysis of nested case-control studies , 1997 .

[38]  Bin Nan,et al.  Efficient estimation for case‐cohort studies , 2004 .

[39]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[40]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[41]  Diederick E Grobbee,et al.  Analysis of case-cohort data: a comparison of different methods. , 2007, Journal of clinical epidemiology.

[42]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[43]  K Steenland,et al.  Increased Precision Using Countermatching in Nested Case‐Control Studies , 1997, Epidemiology.

[44]  Bryan Langholz,et al.  Asymptotic Theory for Nested Case-Control Sampling in the Cox Regression Model , 1992 .

[45]  Agus Salim,et al.  Combining data from 2 nested case-control studies of overlapping cohorts to improve efficiency. , 2009, Biostatistics.

[46]  B Langholz,et al.  Estimation of absolute risk from nested case-control data. , 1997, Biometrics.

[47]  M. Pepe,et al.  Auxiliary covariate data in failure time regression , 1995 .