Incorporating retrospective data into an analysis of time to illness.

For studies of time to illness, the prospective cohort study is, in general, the method of choice. When the time of origin is known for all subjects, then a prevalent cohort study in which individuals are recruited at variable times after the start of the illness process is a suitable alternative. Often, when a prevalent cohort is being formed, data may also be available on individuals who are already ill but are alive. The incorporation of such data, which is practically appealing to many researchers, is discussed. The nature of the required assumptions and the need also to model the illness to death process are illustrated. Full likelihood and pseudolikelihood techniques are outlined and compared with each other and with the use of only prevalent cohort data in a small simulation study. An example based on an HIV seroconverter study is discussed for illustration. The full likelihood method is seen to be too complex for general application. The use of pseudolikelihoods is easier to implement. If there is reliable information on initiating event times and recruitment strategies are well defined, then the incorporation of retrospective data may be beneficial. In other situations, their incorporation is too problematic to be recommended.*To whom correspondence should be addressed.

[1]  J F Lawless,et al.  Likelihood analysis of multi-state models for disease incidence and mortality. , 1988, Statistics in medicine.

[2]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[3]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[4]  Sven Ove Samuelsen,et al.  A psudolikelihood approach to analysis of nested case-control studies , 1997 .

[5]  S. Johansen An Extension of Cox's Regression Model , 1983 .

[6]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[7]  V. Beral,et al.  The AIDS incubation period in the UK estimated from a national register of HIV seroconverters , 1998, AIDS.

[8]  Chris J. Wild,et al.  Fitting prospective regression models to case-control data , 1991 .

[9]  N Keiding,et al.  Retrospective estimation of diabetes incidence from information in a prevalent population and historical mortality. , 1989, American journal of epidemiology.

[10]  J. Lawless,et al.  Pseudolikelihood estimation in a class of problems with response‐related missing covariates , 1997 .

[11]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[12]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[13]  Regression Analysis for Complex Survey Data with Missing Values of a Covariate , 1996 .

[14]  Karl-Heinz Jöckel,et al.  Logistic analysis in case-control studies under validation sampling , 1993 .

[15]  J. Robins,et al.  Analysis of semi-parametric regression models with non-ignorable non-response. , 1997, Statistics in medicine.

[16]  R Brookmeyer,et al.  Biases in prevalent cohorts. , 1987, Biometrics.

[17]  Jan M. Hoem,et al.  Longitudinal Analysis of Labor Market Data: Weighting, misclassification, and other issues in the analysis of survey samples of life histories , 1985 .

[18]  N P Jewell,et al.  Statistical models for prevalent cohort data. , 1993, Biometrics.

[19]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .