A Multiphase Design Strategy for Dealing with Participation Bias

A recently funded study of the impact of oral contraceptive use on the risk of bone fracture employed the randomized recruitment scheme of Weinberg and Wacholder (1990, Biometrics 46, 963-975). One potential complication in the bone fracture study is the potential for differential response rates between cases and controls; participation rates in previous, related studies have been around 70%. Although data from randomized recruitment schemes may be analyzed within the two-phase study framework, ignoring potential differential participation may lead to biased estimates of association. To overcome this, we build on the two-phase framework and propose an extension by introducing an additional stage of data collection aimed specifically at addressing potential differential participation. Four estimators that correct for both sampling and participation bias are proposed; two are general purpose and two are for the special case where covariates underlying the participation mechanism are discrete. Because the fracture study is ongoing, we illustrate the methods using infant mortality data from North Carolina.

[1]  James M. Robins,et al.  Causal diagrams for epidemiologic research. , 1999 .

[2]  I F Lin,et al.  Matched Case—Control Data Analysis with Selection Bias , 2001, Biometrics.

[3]  E. Barrett-Connor,et al.  The effect of response bias on the odds ratio. , 1981, American journal of epidemiology.

[4]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[5]  J. Robins,et al.  A Structural Approach to Selection Bias , 2004, Epidemiology.

[6]  Jinbo Chen,et al.  Breast Cancer Relative Hazard Estimates From Case–Control and Cohort Designs With Missing Data on Mammographic Density , 2008 .

[7]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[8]  Karl-Heinz Jöckel,et al.  Logistic analysis in case-control studies under validation sampling , 1993 .

[9]  E. Leifer,et al.  Multiple Outputation: Inference for Complex Clustered Data by Averaging Analyses from Independent Data , 2003, Biometrics.

[10]  G. Shaw,et al.  Maternal pesticide exposure from multiple sources and selected congenital anomalies. , 1999 .

[11]  J. Robins,et al.  On the semi-parametric efficiency of logistic regression under case-control sampling , 2000 .

[12]  J. Pearl Causal diagrams for empirical research , 1995 .

[13]  C R Weinberg,et al.  The design and analysis of case-control studies with biased sampling. , 1990, Biometrics.

[14]  W Pan,et al.  A Multiple Imputation Approach to Regression Analysis for Doubly Censored Data with Application to AIDS Studies , 2001, Biometrics.

[15]  S Greenland,et al.  Analytic methods for two-stage case-control studies and other stratified designs. , 1991, Statistics in medicine.

[16]  Yi-Hau Chen,et al.  A Pseudoscore Estimator for Regression Problems With Two-Phase Sampling , 2003 .

[17]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[18]  Robert V. Foutz,et al.  On the Unique Consistent Solution to the Likelihood Equations , 1977 .

[19]  Sander Greenland,et al.  Modern Epidemiology 3rd edition , 1986 .

[20]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[21]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[22]  Norman E. Breslow,et al.  Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model , 2004 .

[23]  N. Chatterjee,et al.  On a Supplemented Case–Control Design , 2005, Biometrics.

[24]  Nilanjan Chatterjee,et al.  A Two-Stage Regression Model for Epidemiological Studies With Multivariate Disease Classification Data , 2004 .

[25]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[26]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .