On semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome.

Outcome-dependent sampling designs have been shown to be a cost effective way to enhance study efficiency. We show that the outcome-dependent sampling design with a continuous outcome can be viewed as an extension of the two-stage case-control designs to the continuous-outcome case. We further show that the two-stage outcome-dependent sampling has a natural link with the missing-data and biased-sampling framework. Through the use of semiparametric inference and missing-data techniques, we show that a certain semiparametric maximum likelihood estimator is computationally convenient and achieves the semiparametric efficient information bound. We demonstrate this both theoretically and through simulation.

[1]  Haibo Zhou,et al.  A Semiparametric Empirical Likelihood Method for Data from an Outcome‐Dependent Sampling Scheme with a Continuous Outcome , 2002, Biometrics.

[2]  J. Cornfield,et al.  A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. , 1951, Journal of the National Cancer Institute.

[3]  Jon A. Wellner,et al.  Information bounds for Cox regression models with missing data , 2004, math/0406452.

[4]  Haibo Zhou,et al.  Outcome-Dependent Sampling: An Efficient Sampling and Inference Procedure for Studies With a Continuous Outcome , 2007, Epidemiology.

[5]  H. Rockette,et al.  On maximum likelihood estimation in parametric regression with missing covariates , 2005 .

[6]  Consistency of semiparametric maximum likelihood estimators for two‐phase sampling , 2001 .

[7]  Jing Qin,et al.  Empirical Likelihood in Biased Sample Problems , 1993 .

[8]  Alastair Scott,et al.  Maximum likelihood for generalised case-control studies , 2001 .

[9]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[10]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[11]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.

[12]  Haibo Zhou,et al.  An Estimated Likelihood Method for Continuous Outcome Regression Models With Outcome-Dependent Sampling , 2005 .

[13]  J. Wellner,et al.  Existence and consistency of maximum likelihood in upgraded mixture models , 1992 .

[14]  C R Weinberg,et al.  Flexible maximum likelihood methods for assessing joint effects in case-control studies with complex sampling. , 1994, Biometrics.

[15]  Chunrong Ai,et al.  A Semiparametric Maximum Likelihood Estimator , 1997 .

[16]  Clarice R. Weinberg,et al.  Prospective analysis of case-control data under general multiplicative-intercept risk models , 1993 .

[17]  Haibo Zhou,et al.  A Semiparametric Empirical Likelihood Method for Biased Sampling Schemes with Auxiliary Covariates , 2006, Biometrics.

[18]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[19]  Susan A. Murphy,et al.  Semiparametric Mixtures in Case-Control Studies , 2001 .

[20]  Norman E. Breslow,et al.  Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling , 2003 .

[21]  Yi-Hau Chen,et al.  A Pseudoscore Estimator for Regression Problems With Two-Phase Sampling , 2003 .

[22]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[23]  L P Zhao,et al.  Designs and analysis of two-stage studies. , 1992, Statistics in medicine.

[24]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .