Logistic regression for two-stage case-control data

SUMMARY Samples of diseased cases and nondiseased controls are drawn at random from the population at risk. After classification according to the exposure of interest, subsamples of cases and controls are selected for purposes of covariable ascertainment. A modification of the usual logistic regression analysis yields consistent estimates of covariable adjusted relative risks and their standard errors. By balancing the numbers of exposed and nonexposed for whom covariable inforniation is ascertained within case and control samples, some efficiency may be gained over the usual single stage design, particularly when the exposure is rare and the relative risks associated with the covariables are large. The procedure may be useful also when covariable information is missing for a large part of the sample.