In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of prespecified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity; i.e., those individuals with poorer health outcomes may have more frequent follow-up measurements and the intervals between their repeated measurements may be shorter. In this article, we consider estimation of regression parameters in models for longitudinal data where the follow-up times are not fixed by design but can depend on previous outcomes. In particular, we focus on general linear models for longitudinal data where the repeated measures are assumed to have a multivariate Gaussian distribution. We consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome process. The practical implication of this separation is that the former process can be ignored when making likelihood-based inferences about the latter; i.e., maximum likelihood (ML) estimation of the regression parameters relating the mean of the longitudinal outcomes to covariates does not require that a model for the distribution of follow-up times be specified. As a result, standard statistical software, e.g., SAS PROC MIXED (Littell et al., 1996, SAS System for Mixed Models), can be used to analyze the data. However, we also demonstrate that misspecification of the model for the covariance among the repeated measures will, in general, result in regression parameter estimates that are biased. Furthermore, results of a simulation study indicate that the potential bias due to misspecification of the covariance can be quite considerable in this setting. Finally, we illustrate these results using data from a longitudinal observational study (Lipshultz et al., 1995, New England Journal of Medicine 332, 1738-1743) that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.
[1]
E J Orav,et al.
Female sex and higher drug dose as risk factors for late cardiotoxic effects of doxorubicin therapy for childhood cancer.
,
1995,
The New England journal of medicine.
[2]
S. Zeger,et al.
Longitudinal data analysis using generalized linear models
,
1986
.
[3]
H. Akaike.
A new look at the statistical model identification
,
1974
.
[4]
J. Kalbfleisch,et al.
The Statistical Analysis of Failure Time Data
,
1980
.
[5]
R. Gelber,et al.
Treatment of childhood acute lymphoblastic leukemia: results of Dana-Farber Cancer Institute/Children's Hospital Acute Lymphoblastic Leukemia Consortium Protocol 85-01.
,
1994,
Journal of clinical oncology : official journal of the American Society of Clinical Oncology.
[6]
S. Colan,et al.
Late cardiac effects of doxorubicin therapy for acute lymphoblastic leukemia in childhood.
,
1991,
The New England journal of medicine.
[7]
R. Littell.
SAS System for Mixed Models
,
1996
.
[8]
P. J. Huber.
The behavior of maximum likelihood estimates under nonstandard conditions
,
1967
.
[9]
H. White.
Maximum Likelihood Estimation of Misspecified Models
,
1982
.
[10]
R. H. Jones,et al.
Unequally spaced longitudinal data with AR(1) serial correlation.
,
1991,
Biometrics.
[11]
D. Rubin,et al.
Statistical Analysis with Missing Data.
,
1989
.