Semi-parametric efficiency bounds for regression models under response-selective sampling: the profile likelihood approach

We obtain an information bound for estimates of parameters in general regression models where data are collected under a variety of response-selective sampling schemes, together with a simple formula for the asymptotic variance of the semi-parametric maximum likelihood estimate. This is compared to the bound and the estimate is found to be fully efficient in a variety of settings. A small simulation study is reported to illustrate the small-sample efficiency of the semi-parametric estimator.

[1]  Alan Lee,et al.  On the Semiparametric Efficiency of the Scott-Wild Estimator under Choice-Based and Two-Phase Sampling , 2007, Adv. Decis. Sci..

[2]  A. Scott,et al.  Fitting Logistic Regression Models in Stratified Case-Control Studies , 1991 .

[3]  J. Lawless,et al.  Estimation from truncated lifetime data with supplementary information on covariates and censoring times , 1996 .

[4]  Alastair Scott,et al.  Maximum likelihood for generalised case-control studies , 2001 .

[5]  A. Scott,et al.  Re-using data from case-control studies. , 1997, Statistics in medicine.

[6]  Alice S. Whittemore,et al.  Logistic regression of family data from case-control studies , 1995 .

[7]  Charles F. Manski,et al.  Estimation of Response Probabilities From Augmented Retrospective Observations , 1985 .

[8]  A. Scott,et al.  On the robustness of weighted methods for fitting models to case–control data , 2002 .

[9]  Alastair Scott,et al.  Fitting binary regression models with case-augmented samples , 2006 .

[10]  J. Robins,et al.  On the semi-parametric efficiency of logistic regression under case-control sampling , 2000 .

[11]  Alan J. Lee Semi-parametric eciency bounds for regression models under choice-based sampling , 2004 .

[12]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[13]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[14]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[15]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[16]  James M. Robins,et al.  Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates , 1995 .

[17]  Chris J. Wild,et al.  Fitting prospective regression models to case-control data , 1991 .

[18]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[19]  J F Lawless,et al.  Likelihood analysis of multi-state models for disease incidence and mortality. , 1988, Statistics in medicine.

[20]  Norman E. Breslow,et al.  Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling , 2003 .

[21]  Alastair Scott,et al.  The analysis of retrospective family studies , 2002 .

[22]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[23]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[24]  Yannan Jiang,et al.  Secondary analysis of case‐control data , 2006, Statistics in medicine.

[25]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[26]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.