Estimation in regression models for longitudinal binary data with outcome-dependent follow-up.

In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.

[1]  Stuart R. Lipsitz,et al.  A Model for Binary Time Series Data with Serial Odds Ratio Patterns , 1995 .

[2]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[3]  E J Orav,et al.  Female sex and higher drug dose as risk factors for late cardiotoxic effects of doxorubicin therapy for childhood cancer. , 1995, The New England journal of medicine.

[4]  Andrea Rotnitzky,et al.  Regression Models for Discrete Longitudinal Responses , 1993 .

[5]  Monica A. Walker,et al.  Studies in Item Analysis and Prediction. , 1962 .

[6]  A. Rotnitzky,et al.  A note on the bias of estimators with missing data. , 1994, Biometrics.

[7]  J. Robins,et al.  Estimation of the Causal Effect of a Time-Varying Exposure on the Marginal Mean of a Repeated Binary Outcome , 1999 .

[8]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[9]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[10]  M. Pepe,et al.  A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data , 1994 .

[11]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[12]  M. Kenward,et al.  Informative dropout in longitudinal data analysis (with discussion) , 1994 .

[13]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[14]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[15]  G Molenberghs,et al.  GEE with Gaussian Estimation of the Correlations When Data Are Incomplete , 2000, Biometrics.

[16]  S. Colan,et al.  Late cardiac effects of doxorubicin therapy for acute lymphoblastic leukemia in childhood. , 1991, The New England journal of medicine.

[17]  Daniel O. Scharfstein,et al.  Analysis of longitudinal data with irregular, outcome‐dependent follow‐up , 2004 .

[18]  Joseph G Ibrahim,et al.  Parameter Estimation in Longitudinal Studies with Outcome‐Dependent Follow‐Up , 2002, Biometrics.

[19]  S. Lipsitz,et al.  A Three-stage Estimator for Studies with Repeated and Possibly Missing Binary Outcomes , 1992 .

[20]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[21]  R. Gelber,et al.  Treatment of childhood acute lymphoblastic leukemia: results of Dana-Farber Cancer Institute/Children's Hospital Acute Lymphoblastic Leukemia Consortium Protocol 85-01. , 1994, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[22]  D. Hand,et al.  Practical Longitudinal Data Analysis , 1996 .