A general regression framework for a secondary outcome in case-control studies.

Modern case–control studies typically involve the collection of data on a large number of outcomes, often at considerable logistical and monetary expense. These data are of potentially great value to subsequent researchers, who, although not necessarily concerned with the disease that defined the case series in the original study, may want to use the available information for a regression analysis involving a secondary outcome. Because cases and controls are selected with unequal probability, regression analysis involving a secondary outcome generally must acknowledge the sampling design. In this paper, the author presents a new framework for the analysis of secondary outcomes in case–control studies. The approach is based on a careful re-parameterization of the conditional model for the secondary outcome given the case–control outcome and regression covariates, in terms of (a) the population regression of interest of the secondary outcome given covariates and (b) the population regression of the case–control outcome on covariates. The error distribution for the secondary outcome given covariates and case–control status is otherwise unrestricted. For a continuous outcome, the approach sometimes reduces to extending model (a) by including a residual of (b) as a covariate. However, the framework is general in the sense that models (a) and (b) can take any functional form, and the methodology allows for an identity, log or logit link function for model (a).

[1]  Yannan Jiang,et al.  Secondary analysis of case‐control data , 2006, Statistics in medicine.

[2]  Eric J Tchetgen Tchetgen,et al.  On doubly robust estimation in a semiparametric odds ratio model. , 2010, Biometrika.

[3]  J. Schwartz,et al.  Cumulative Exposure to Lead in Relation to Cognitive Function in Older Women , 2008, Environmental health perspectives.

[4]  James Robins,et al.  The Semiparametric Case‐Only Estimator , 2010, Biometrics.

[5]  D. Zeng,et al.  Proper analysis of secondary phenotype data in case‐control association studies , 2009, Genetic epidemiology.

[6]  C. Gieger,et al.  Identification of ten loci associated with height highlights new biological pathways in human growth , 2008, Nature Genetics.

[7]  Shah Ebrahim,et al.  Common variants in the GDF5-UQCC region are associated with variation in human height , 2008, Nature Genetics.

[8]  A. Scott,et al.  Re-using data from case-control studies. , 1997, Statistics in medicine.

[9]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .

[10]  J. Robins,et al.  On the semi-parametric efficiency of logistic regression under case-control sampling , 2000 .

[11]  Subhajyoti De,et al.  Common variants near MC4R are associated with fat mass, weight and risk of obesity , 2008, Nature Genetics.

[12]  J. Klenk,et al.  Analyses of Case–Control Data for Additional Outcomes , 2007, Epidemiology.

[13]  Raymond J Carroll,et al.  Robust estimation for homoscedastic regression in the secondary analysis of case–control data , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[14]  Jian Wang,et al.  Estimation of odds ratios of genetic variants for the secondary phenotypes associated with primary diseases , 2011, Genetic epidemiology.

[15]  M. Gail,et al.  Using cases to strengthen inference on the association between single nucleotide polymorphisms and a secondary phenotype in genome‐wide association studies , 2010, Genetic epidemiology.

[16]  Wei Zhang,et al.  Bias correction to secondary trait analysis with case–control design , 2012, Statistics in medicine.

[17]  Marie Reilly,et al.  Re‐use of case–control data for analysis of new outcome variables , 2005, Statistics in medicine.

[18]  Richa Saxena,et al.  A common variant of HMGA2 is associated with adult and childhood height in the general population , 2007, Nature Genetics.

[19]  N. Nagelkerke,et al.  Logistic regression in case-control studies: the effect of using independent as dependent variables. , 1995, Statistics in medicine.

[20]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[21]  Eric J Tchetgen Tchetgen,et al.  Double‐robust estimation of an exposure‐outcome odds ratio adjusting for confounding in cohort and case‐control studies , 2011, Statistics in medicine.

[22]  Eric J. Tchetgen Tchetgen Robust discovery of genetic associations incorporating gene-environment interaction and independence. , 2011 .

[23]  S Wacholder,et al.  Parity, oral contraceptives, and the risk of ovarian cancer among carriers and noncarriers of a BRCA1 or BRCA2 mutation. , 2001, The New England journal of medicine.

[24]  P. Kraft,et al.  Genome‐wide association scans for secondary traits using case‐control samples , 2009, Genetic epidemiology.

[25]  Fei Zou,et al.  Unified Analysis of Secondary Traits in Case–Control Association Studies , 2013, Journal of the American Statistical Association.