An Estimated Likelihood Method for Continuous Outcome Regression Models With Outcome-Dependent Sampling

Many biomedical observational studies attempt to relate a continuous outcome to an environmental exposure and other important covariates. If the outcome is easier or cheaper to measure relative to the exposure of interest, then the outcome may be observed for every member of a finite-study population, whereas exposure measurements may be obtained only for a relatively small subsample of this population. Rather than selecting a simple random subsample of individuals for exposure measurement, investigators may attempt to enhance study efficiency by allowing the selection probabilities to depend on the observed outcomes; we refer to such sampling schemes as outcome-dependent sampling (ODS). Standard estimation methods that ignore the ODS design will yield biased and inconsistent parameter estimates. Furthermore, it is generally desirable to use estimators that incorporate all available data as analyses restricted to subjects with complete information are inefficient. To this end, we extend an estimated likelihood method, originally developed for discrete outcome measurement error problems in which accurate exposure measurements are made only for a simple random “validation” sample, to allow for continuous outcomes and ODS designs. We derive the asymptotic properties of the proposed estimator and use simulated data to show that the asymptotic results closely approximate the finite-sample properties in samples of moderate size. We also use simulated data to compare the performance of our proposed estimator with that of existing methods applicable to the ODS problem.

[1]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[2]  Raymond J. Carroll,et al.  Semiparametric Estimation in Logistic Measurement Error Models , 1989 .

[3]  Haibo Zhou,et al.  A Semiparametric Empirical Likelihood Method for Data from an Outcome‐Dependent Sampling Scheme with a Continuous Outcome , 2002, Biometrics.

[4]  Pranab Kumar Sen,et al.  Large Sample Methods in Statistics: An Introduction with Applications , 1993 .

[5]  L P Zhao,et al.  Designs and analysis of two-stage studies. , 1992, Statistics in medicine.

[6]  Chris J. Wild,et al.  Fitting prospective regression models to case-control data , 1991 .

[7]  Margaret S. Pepe,et al.  The relationship between hot-deck multiple imputation and weighted likelihood. , 1997, Statistics in medicine.

[8]  D. Holt,et al.  The Effect of Survey Design on Regression Analysis , 1980 .

[9]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[10]  Jianwen Cai,et al.  Weighted estimating equations for semiparametric transformation models with censored data from a case‐cohort design , 2004 .

[11]  R. Jennrich Asymptotic Properties of Non-Linear Least Squares Estimators , 1969 .

[12]  Haoxuan Zhou,et al.  Failure time regression with continuous covariates measured with error , 2000 .

[13]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[14]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[15]  Clarice R. Weinberg,et al.  Prospective analysis of case-control data under general multiplicative-intercept risk models , 1993 .

[16]  A. Scott,et al.  Fitting Logistic Models Under Case‐Control or Choice Based Sampling , 1986 .

[17]  Yi-Hau Chen,et al.  A Pseudoscore Estimator for Regression Problems With Two-Phase Sampling , 2003 .

[18]  S Greenland,et al.  Analytic methods for two-stage case-control studies and other stratified designs. , 1991, Statistics in medicine.

[19]  James M. Robins,et al.  Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates , 1995 .

[20]  Steven R. Lerman,et al.  The Estimation of Choice Probabilities from Choice Based Samples , 1977 .

[21]  M. Pepe,et al.  Auxiliary covariate data in failure time regression , 1995 .

[22]  Charles F. Manski,et al.  Estimation of Response Probabilities From Augmented Retrospective Observations , 1985 .

[23]  F. Scholz Maximum Likelihood Estimation , 2006 .

[24]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[25]  Y. Vardi Empirical Distributions in Selection Bias Models , 1985 .

[26]  S. Cosslett,et al.  1 Estimation from endogenously stratified samples , 1993 .

[27]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[28]  Thomas R. Fleming,et al.  A Nonparametric Method for Dealing with Mismeasured Covariate Data , 1991 .

[29]  P. Sen,et al.  Large sample methods in statistics , 1993 .

[30]  J F Lawless,et al.  Likelihood analysis of multi-state models for disease incidence and mortality. , 1988, Statistics in medicine.

[31]  Kaipillil Vijayan,et al.  Optimal Estimation for Response-Dependent Retrospective Sampling , 1996 .

[32]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[33]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[34]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[35]  D. Holt,et al.  Regression Analysis of Data from Complex Surveys , 1980 .

[36]  Robert V. Foutz,et al.  On the Unique Consistent Solution to the Likelihood Equations , 1977 .

[37]  J. Lawless,et al.  Pseudolikelihood estimation in a class of problems with response‐related missing covariates , 1997 .