Covariate Bias Induced by Length-Biased Sampling of Failure Times

Although many authors have proposed different approaches to the analysis of length-biased survival data, a number of issues have not been fully addressed. The most important among these issues is perhaps that regarding inclusion of covariates into the analysis of length-biased lifetime data collected through cross-sectional sampling of a population. One aspect of this problem, which appears to have been neglected in the literature, concerns the effect of length bias on the sampling distribution of the covariates. In most regression analyses, it is conventional to condition on the observed covariate values; however, certain covariate values could be preferentially selected into the sample, being linked to the long-term survivors, who themselves are favored by the sampling mechanism. This observation raises two questions: (1) Does the conditional analysis of covariates lead to biased estimators of regression coefficients?; and (2) does inference through the joint l likelihood of covariates and failure times yield more efficient estimators of the regression parameters? We present a joint likelihood approach and study the large-sample behavior of the resulting maximum likelihood estimators (MLEs). We find that these MLEs are more efficient than their conditional counterparts even though the two MLEs are asymptotically equal. Our results are illustrated using data on survival with dementia, collected as part of the Canadian Study of Health and Aging.