A hybrid parametric and empirical likelihood model for evaluating interactions in case-control Studies.

The case-control design provides an effective way to collect covariate information conditioning on subjects’ disease status. The standard logistic regression model can be used to model the interaction between two covariates under such a design, but the prospective logistic regression method might not be the most efficient one when certain appropriate constraints can be imposed on the covariate distribution. We develop a hybrid approach for the statistical inference of the interaction under the case-control design. We use a parametric model to characterize the conditional distribution of one covariate given the another covariate in the control population, while leaving the distribution of the later covariate to be fully nonparametric. A maximum hybrid parametric and empirical likelihood method is adopted for the evaluation of all parameters. The estimator and the associated test derived from the proposed semiparametric model are suitable for evaluating the interaction between two covariates of various types (discrete or continuous). Asymptotic results for both the estimators and the test statistics were established, and the advantages of the proposed method over the existing ones are demonstrated through simulation results and a real data example.

[1]  Deniel Rabinowitz A note on efficient estimation from case-control data , 1997 .

[2]  C. R. Rao,et al.  Weighted distributions and size-biased sampling with applications to wildlife populations and human families , 1978 .

[3]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[4]  C B Begg,et al.  Statistical analysis of molecular epidemiology studies employing case-series. , 1994, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[5]  Simon Heath,et al.  Lung cancer susceptibility locus at 5p15.33 , 2008, Nature Genetics.

[6]  Christopher I Amos,et al.  Common 5p15.33 and 6p21.33 variants influence lung cancer risk , 2008, Nature Genetics.

[7]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[8]  Paolo Vineis,et al.  Sequence variants at the TERT-CLPTM1L locus associate with many cancer types , 2009, Nature Genetics.

[9]  Bhramar Mukherjee,et al.  Exploiting Gene‐Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes‐Type Shrinkage Estimator to Trade‐Off between Bias and Efficiency , 2008, Biometrics.

[10]  Paolo Vineis,et al.  A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25 , 2008, Nature.

[11]  J. Qin,et al.  A goodness-of-fit test for logistic regression models based on case-control data , 1997 .

[12]  Jack A. Taylor,et al.  Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. , 1994, Statistics in medicine.

[13]  P S Albert,et al.  Limitations of the case-only design for identifying gene-environment interactions. , 2001, American journal of epidemiology.

[14]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[15]  Y. Vardi,et al.  Nonparametric Estimation in the Presence of Length Bias , 1982 .

[16]  D. Zeng,et al.  Proper analysis of secondary phenotype data in case‐control association studies , 2009, Genetic epidemiology.

[17]  Ying Wang,et al.  A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. , 2009, American journal of human genetics.

[18]  Daniel F. Gudbjartsson,et al.  A variant associated with nicotine dependence, lung cancer and peripheral arterial disease , 2008, Nature.

[19]  G. Mills,et al.  Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1 , 2008, Nature Genetics.

[20]  Hong Zhang,et al.  A COPULA-MODEL BASED SEMIPARAMETRIC INTERACTION TEST UNDER THE CASE-CONTROL DESIGN. , 2013, Statistica Sinica.

[21]  Yukun Liu,et al.  Quantile and quantile-function estimations under density ratio model , 2013, 1308.2845.

[22]  J. Anderson Separate sample logistic discrimination , 1972 .

[23]  J. Robins,et al.  On the semi-parametric efficiency of logistic regression under case-control sampling , 2000 .

[24]  J. Graham,et al.  Case-Control Inference of Interaction between Genetic and Nongenetic Risk Factors under Assumptions on Their Distribution , 2007, Statistical applications in genetics and molecular biology.

[25]  Vijayan N. Nair,et al.  Maximum likelihood estimation under a successive sampling discovery model , 1989 .