Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling

Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part agrees with the more general information bound calculations of Robins, Hsieh and Newey (1995). By verifying the conditions of Murphy and van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.

[1]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[2]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[3]  Yi-Hau Chen,et al.  A Pseudoscore Estimator for Regression Problems With Two-Phase Sampling , 2003 .

[4]  A. Vaart Efficiency. of infinite dimensional M- estimators , 1995 .

[5]  Richard D. Gill,et al.  Large sample theory of empirical distributions in biased sampling models , 1988 .

[6]  Susan A. Murphy,et al.  Rejoinder to discussion of ``On Profile Likelihood''. , 2000 .

[7]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[8]  Peter B. Gilbert Large sample theory of maximum likelihood estimates in semiparametric biased sampling models , 2000 .

[9]  A. V. D. Vaart,et al.  Maximum Likelihood Estimation with Partially Censored Data , 1994 .

[10]  W. J. Hall,et al.  Information and Asymptotic Efficiency in Parametric-Nonparametric Models , 1983 .

[11]  Steven G. Self,et al.  Asymptotic Distribution Theory and Efficiency Results for Case-Cohort Studies , 1988 .

[12]  Consistency of semiparametric maximum likelihood estimators for two‐phase sampling , 2001 .

[13]  Yihui Zhan,et al.  Bootstrapping Z? Estimators , 1996 .

[14]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[15]  J. Wellner,et al.  Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes , 2000 .

[16]  A. V. D. Vaart Asymptotic Statistics: Delta Method , 1998 .

[17]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[18]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[19]  J. Wellner,et al.  Existence and consistency of maximum likelihood in upgraded mixture models , 1992 .

[20]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[21]  D. Pollard New Ways to Prove Central Limit Theorems , 1985, Econometric Theory.

[22]  A. Jon Information Bounds for Regression Models with Missing Data , 2000 .

[23]  M. Emond,et al.  Information Bounds for Regression Models with Missing Data , 2000 .

[24]  A. .,et al.  SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE , 1996 .

[25]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[26]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[27]  K. Pearson Biometrika , 1902, The American Naturalist.

[28]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[29]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[30]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[31]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[32]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[33]  Susan A. Murphy,et al.  Observed information in semi-parametric models , 1999 .

[34]  Alastair Scott,et al.  Maximum likelihood for generalised case-control studies , 2001 .

[35]  James M. Robins,et al.  Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates , 1995 .

[36]  J. Robins,et al.  On the semi-parametric efficiency of logistic regression under case-control sampling , 2000 .

[37]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[38]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .