Auxiliary covariate data in failure time regression

SUMMARY We consider the problem of missing covariate data in the context of censored failure time relative risk regression. Auxiliary covariate data, which are considered informative about the missing data but which are not explicitly part of the relative risk regression model, may be available. Full covariate information is available for a validation set. An estimated partial likelihood method is proposed for estimating relative risk parameters. This method is an extension of the estimated likelihood regression analysis method for uncensored data (Pepe, 1992; Pepe & Fleming, 1991). A key feature of the method is that it is nonparametric with respect to the association between the missing and observed, including auxiliary, covariate components. Asymptotic distribution theory is derived for the proposed estimated partial likelihood estimator in the case where the auxiliary or mismeasured covariates are categorical. Asymptotic efficiencies are calculated for exponential failure times using an exponential relative risk model. The estimated partial likelihood estimator compares favourably with a fully parametric maximum likelihood analysis. Comparisons are also made with a standard partial likelihood analysis which ignores the incomplete observations. Important efficiency gains can be made with the estimated partial likelihood method. Small sample properties are investigated through simulation studies.