The proportional hazards model for survey data from independent and clustered super-populations

Data from most complex surveys are subject to selection bias and clustering due to the sampling design. Results developed for a random sample from a super-population model may not apply. Ignoring the survey sampling weights may cause biased estimators and erroneous confidence intervals. In this paper, we use the design approach for fitting the proportional hazards (PH) model and prove formally the asymptotic normality of the sample maximum partial likelihood (SMPL) estimators under the PH model for both stochastically independent and clustered failure times. In the first case, we use the central limit theorem for martingales in the joint design-model space, and this enables us to obtain results for a general multistage sampling design under mild and easily verifiable conditions. In the case of clustered failure times, we require asymptotic normality in the sampling design space directly, and this holds for fewer sampling designs than in the first case. We also propose a variance estimator of the SMPL estimator. A key property of this variance estimator is that we do not have to specify the second-stage correlation model.