A Semiparametric Empirical Likelihood Method for Biased Sampling Schemes with Auxiliary Covariates

We consider a semiparametric inference procedure for data from epidemiologic studies conducted with a two-component sampling scheme where both a simple random sample and multiple outcome- or outcome-/auxiliary-dependent samples are observed. This sampling scheme allows the investigators to oversample certain subpopulations believed to have more information about the regression model while still gaining insights about the underlying population through the simple random sample. We focus on settings where there is no additional information about the parent cohort and the sampling probability is nonidentifiable. We motivate our problem with an ongoing study to assess the association between the mutation level of epidermal growth factor receptor (EGFR) and the antitumor response to EGFR-targeted therapy among nonsmall cell lung cancer patients. The proposed method applies to both binary and multicategorical outcome data and allows an arbitrary link function in the framework of generalized linear models. Simulation studies show that the proposed estimator has nice small sample properties. The proposed method is illustrated with a data example.

[1]  Haibo Zhou,et al.  An Estimated Likelihood Method for Continuous Outcome Regression Models With Outcome-Dependent Sampling , 2005 .

[2]  A. Owen Empirical Likelihood Ratio Confidence Regions , 1990 .

[3]  Clarice R. Weinberg,et al.  Prospective analysis of case-control data under general multiplicative-intercept risk models , 1993 .

[4]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[5]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[6]  Haibo Zhou,et al.  A Semiparametric Empirical Likelihood Method for Data from an Outcome‐Dependent Sampling Scheme with a Continuous Outcome , 2002, Biometrics.

[7]  Thomas R. Fleming,et al.  A Nonparametric Method for Dealing with Mismeasured Covariate Data , 1991 .

[8]  M. Pepe,et al.  Auxiliary covariate data in failure time regression , 1995 .

[9]  M. Longnecker,et al.  Maternal serum level of 1,1-dichloro-2,2-bis(p-chlorophenyl)ethylene and risk of cryptorchidism, hypospadias, and polythelia among male offspring. , 2002, American journal of epidemiology.

[10]  Raymond J. Carroll,et al.  Semiparametric Estimation in Logistic Measurement Error Models , 1989 .

[11]  L P Zhao,et al.  Designs and analysis of two-stage studies. , 1992, Statistics in medicine.

[12]  J. Cornfield,et al.  A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. , 1951, Journal of the National Cancer Institute.

[13]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[14]  S. Gabriel,et al.  EGFR Mutations in Lung Cancer: Correlation with Clinical Response to Gefitinib Therapy , 2004, Science.

[15]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[16]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[17]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[18]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[19]  Patricia L. Harris,et al.  Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. , 2004, The New England journal of medicine.

[20]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[21]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.

[22]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[23]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[24]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[25]  Haoxuan Zhou,et al.  Failure time regression with continuous covariates measured with error , 2000 .