Efficient estimation in multi-phase case-control studies

In this paper we discuss the analysis of multi-phase, or multi-stage, case-control studies and present an efficient semiparametric maximum-likelihood approach that unifies and extends earlier work, including the seminal case-control paper by Prentice & Pyke (1979), work by Breslow & Cain (1988), Scott & Wild (1991), Breslow & Holubkov (1997) and others. The theoretical derivations apply to arbitrary binary regression models but we present results for logistic regression and show that the approach can be implemented by including additional intercept terms in the logistic model and then making some simple corrections to the score and information equations used in a Newton--Raphson or Fisher-scoring maximization of the prospective loglikelihood. Copyright 2010, Oxford University Press.

[1]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.

[2]  William G. Cochran,et al.  Sampling Techniques, 3rd Edition , 1963 .

[3]  Yuichi Hirose,et al.  Semi-parametric efficiency bounds for regression models under response-selective sampling: the profile likelihood approach , 2010 .

[4]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[5]  A. Winsor Sampling techniques. , 2000, Nursing times.

[6]  Edward Baum,et al.  Treatment of Wilms' tumor. Results of the third national Wilms' tumor study , 1989, Cancer.

[7]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[8]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[9]  A. Scott,et al.  Fitting Logistic Regression Models in Stratified Case-Control Studies , 1991 .

[10]  Alan J. Lee Semi-parametric eciency bounds for regression models under choice-based sampling , 2004 .

[11]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[12]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[13]  Norman E. Breslow,et al.  Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling , 2003 .

[14]  J. Anderson Separate sample logistic discrimination , 1972 .

[15]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[16]  Michael P. Hirsh,et al.  Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms' tumor: A report from the national Wilms' tumor study group , 1998 .

[17]  Estimating incidence rates from population-based case-control studies in the presence of nonrespondents , 2002 .

[18]  Alastair Scott,et al.  Fitting binary regression models with case-augmented samples , 2006 .

[19]  Nilanjan Chatterjee,et al.  Maximum likelihood inference on a mixed conditionally and marginally specified regression model for genetic epidemiologic studies with two-phase sampling , 2007 .

[20]  Alastair Scott,et al.  Maximum likelihood for generalised case-control studies , 2001 .

[21]  Michal Kulich,et al.  Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies , 2004 .

[22]  N E Breslow,et al.  Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms' tumor: a report from the National Wilms' Tumor Study Group. , 1998, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[23]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[24]  Alan Lee,et al.  On the Semiparametric Efficiency of the Scott-Wild Estimator under Choice-Based and Two-Phase Sampling , 2007, Adv. Decis. Sci..

[25]  G. Ridder,et al.  The Asymptotic Variance of Semi-parametric Estimators with Generated Regressors , 2010 .

[26]  A. Scott,et al.  On the robustness of weighted methods for fitting models to case–control data , 2002 .

[27]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[28]  A. Scott,et al.  On the Breslow–Holubkov estimator , 2007, Lifetime data analysis.

[29]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[30]  J Halpern,et al.  Multi-stage sampling in genetic epidemiology. , 1997, Statistics in medicine.