A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data

Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y,X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.

[1]  R. Carroll,et al.  Estimation in choice-based sampling with measurement error and bootstrap analysis , 1997 .

[2]  D. Ruppert Selecting the Number of Knots for Penalized Splines , 2002 .

[3]  Nilanjan Chatterjee,et al.  Cigarette smoking, N-acetyltransferase genes and the risk of advanced colorectal adenoma. , 2006, Pharmacogenomics.

[4]  Jack A. Taylor,et al.  Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. , 1994, Statistics in medicine.

[5]  D. Zeng,et al.  Proper analysis of secondary phenotype data in case‐control association studies , 2009, Genetic epidemiology.

[6]  W D Flanders,et al.  Sample size requirements in case-only designs to detect gene-environment interaction. , 1997, American journal of epidemiology.

[7]  R. Carroll,et al.  Haplotype‐Based Regression Analysis and Inference of Case–Control Studies with Unphased Genotypes and Measurement Errors in Environmental Exposures , 2008, Biometrics.

[8]  D. Ruppert,et al.  Penalized Spline Estimation for Partially Linear Single-Index Models , 2002 .

[9]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[10]  J. Gohagan,et al.  The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial of the National Cancer Institute: history, organization, and status. , 2000, Controlled clinical trials.

[11]  L C Kwee,et al.  Simple methods for assessing haplotype‐environment interactions in case‐only and case‐control studies , 2007, Genetic epidemiology.

[12]  M. Gail,et al.  Using cases to strengthen inference on the association between single nucleotide polymorphisms and a secondary phenotype in genome‐wide association studies , 2010, Genetic epidemiology.

[13]  D. Zeng,et al.  Likelihood-Based Inference on Haplotype Effects in Genetic Association Studies , 2006 .

[14]  Lue Ping Zhao,et al.  A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. , 2003, American journal of human genetics.

[15]  S Wacholder,et al.  Parity, oral contraceptives, and the risk of ovarian cancer among carriers and noncarriers of a BRCA1 or BRCA2 mutation. , 2001, The New England journal of medicine.

[16]  Raymond J Carroll,et al.  Retrospective analysis of haplotype-based case control studies under a flexible model for gene environment association. , 2008, Biostatistics.

[17]  Arnab Maity,et al.  Testing in semiparametric models with interaction, with applications to gene-environment interactions. , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[18]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[19]  Raymond J Carroll,et al.  Shrinkage Estimators for Robust and Efficient Inference in Haplotype-Based Case-Control Studies , 2009, Journal of the American Statistical Association.

[20]  D Zeng,et al.  A general framework for studying genetic effects and gene-environment interactions with missing data. , 2010, Biostatistics.

[21]  G. Satten,et al.  Inference on haplotype effects in case-control studies using unphased genotype data. , 2003, American journal of human genetics.

[22]  M. Durbán,et al.  Flexible smoothing with P-splines: a unified approach , 2002 .

[23]  Raymond J Carroll,et al.  Robust estimation for homoscedastic regression in the secondary analysis of case–control data , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[24]  D. Ruppert,et al.  On the asymptotics of penalized splines , 2008 .

[25]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .