The central limit theorem under censoring

A number of recent papers have been putting the final touches to the asymptotic theory of the Kaplan-Meier estimator (Kaplan and Meier 1958) and functionals based on it (see Wang 1987; Gijbels and Veraverbeke 1991; Einmahl and Koning 1992; Gill 1994; Stute 1995). The last paper cited establishes a central limit theorem (CLT) for a Kaplan-Meier integral by first expressing it as a sum of independent and identically distributed (i.i.d.) random variables plus an asymptotically negligible remainder term. Stute's result allows both discontinuous populations and a general class of functions, thus generalizing other CLT results (Gill 1983; Schick et al. 1988; Yang 1994). However, it is obtained using a delicate (and computationintensive) approach based on U-statistic approximations. He justifies this approach by citing difficulties in the application of the counting processes techniques, and requires stronger assumptions than those used with martingale methods. In addition, the expression for the terms in his i.i.d. representation (and consequently for the asymptotic variance) is quite complicated, especially for distributions with atoms. The main purpose of the present paper is to prove the CLT and provide an alternative i.i.d. representation with simpler terms and under weaker conditions. This is made possible by using the martingale methods developed by Gill (1980; 1983), and the identities and inequalities of Efron and Johnstone (1990). With the present approach the CLT is established directly, not as a consequence of the i.i.d. representation. These techniques require that the Kaplan-Meier integral be re-expressed as an integral in terms of the cumulative hazard function. Efron and Johnstone (1990) studied extensively the consequences of such re-expressions in the uncensored data context, and their results are central to understanding the relation between the expressions given here and those of Stute (1995). It will be seen that the variance expression in the present paper is related to Efron and Johnstone's 'advance time' transformation A, while the terms in our i.i.d. representation are related to its adjoint transformation B. Surprisingly, expressions resulting from these transformations do not change much under random censoring, while the traditional