Cumulative sojourn time in longitudinal studies: a sequential imputation method to handle missing health state data due to dropout

Missing data are ubiquitous in longitudinal studies. In this paper, we propose an imputation procedure to handle dropouts in longitudinal studies. By taking advantage of the monotone missing pattern resulting from dropouts, our imputation procedure can be carried out sequentially, which substantially reduces the computation complexity. In addition, at each step of the sequential imputation, we set up a model selection mechanism that chooses between a parametric model and a nonparametric model to impute eachmissing observation. Unlike usual model selection procedures that aim at finding a single model fitting the entire data set well, our model selection procedure is customized to find a suitable model for the prediction of each missing observation.

[1]  Balgobin Nandram,et al.  Hierarchical Bayesian Nonresponse Models for Binary Data From Small Areas With Uncertainty About Ignorability , 2002 .

[2]  J. Lieberman,et al.  Clozapine v. chlorpromazine in treatment-naive, first-episode schizophrenia: 9-year outcomes of a randomised clinical trial. , 2011, The British journal of psychiatry : the journal of mental science.

[3]  M. Liebowitz,et al.  Statistical choices can affect inferences about treatment efficacy: a case study from obsessive-compulsive disorder research. , 2008, Journal of psychiatric research.

[4]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[5]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[6]  Bradley P. Carlin,et al.  Generalized Linear Models for Small-Area Estimation , 1998 .

[7]  S D Imber,et al.  Some conceptual and statistical issues in analysis of longitudinal psychiatric data. Application to the NIMH treatment of Depression Collaborative Research Program dataset. , 1993, Archives of general psychiatry.

[8]  D. Rubin,et al.  Small-sample degrees of freedom with multiple imputation , 1999 .

[9]  S. Lipsitz,et al.  Weighted least squares analysis of repeated categorical measurements with outcomes subject to nonresponse. , 1994, Biometrics.

[10]  Nathaniel Schenker,et al.  Asymptotic results for multiple imputation , 1988 .

[11]  D. Holt,et al.  A Systematic Approach to Automatic Edit and Imputation , 1976 .

[12]  Donald Hedeker,et al.  Missing Data in Longitudinal Trials - Part B, Analytic Issues. , 2008, Psychiatric annals.

[13]  Donald B. Rubin,et al.  Performing likelihood ratio tests with multiply-imputed data sets , 1992 .

[14]  J. Lieberman,et al.  Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. , 2005, The New England journal of medicine.

[15]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[16]  N M Laird,et al.  Analysing incomplete longitudinal binary responses: a likelihood-based approach. , 1994, Biometrics.

[17]  P. Lane Handling drop‐out in longitudinal clinical trials: a comparison of the LOCF and MMRM approaches , 2008, Pharmaceutical statistics.

[18]  Robert F. Woolson,et al.  Analysis of categorical incomplete longitudinal data , 1984 .

[19]  J. Ashford,et al.  Multi-variate probit analysis. , 1970, Biometrics.

[20]  Richard J Cook,et al.  Marginal Analysis of Incomplete Longitudinal Binary Data: A Cautionary Note on LOCF Imputation , 2004, Biometrics.

[21]  J. Lieberman,et al.  Atypical and Conventional Antipsychotic Drugs in Treatment-Naive First-Episode Schizophrenia: A 52-Week Randomized Trial of Clozapine Vs Chlorpromazine , 2003, Neuropsychopharmacology.

[22]  Nan M. Laird,et al.  Multivariate Logistic Models for Incomplete Binary Responses , 1996 .

[23]  G G Koch,et al.  Linear model analysis of categorical data with incomplete response vectors. , 1972, Biometrics.

[24]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[25]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[26]  Paul S Albert,et al.  A Latent Autoregressive Model for Longitudinal Binary Data Subject to Informative Missingness , 2002, Biometrics.

[27]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[28]  A S Whittemore,et al.  Methods for analyzing panel studies of acute health effects of air pollution. , 1979, Biometrics.

[29]  R Kay,et al.  A Markov model for analysing cancer markers and disease states in survival studies. , 1986, Biometrics.

[30]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[31]  J F Lawless,et al.  Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. , 1994, Statistics in medicine.

[32]  Bradley Roy Sands Analysis of binary data in the presence of non-Bernoulli sources of variation , 1975 .

[33]  Nicholas T. Longford,et al.  Multivariate shrinkage estimation of small area means and proportions , 1999 .

[34]  J. Kalbfleisch,et al.  The Analysis of Panel Data under a Markov Assumption , 1985 .

[35]  Donald B. Rubin,et al.  Significance levels from repeated p-values with multiply imputed data , 1991 .