On assessing model fit for distribution‐free longitudinal models under missing data

The generalized estimating equation (GEE), a distribution-free, or semi-parametric, approach for modeling longitudinal data, is used in a wide range of behavioral, psychotherapy, pharmaceutical drug safety, and healthcare-related research studies. Most popular methods for assessing model fit are based on the likelihood function for parametric models, rendering them inappropriate for distribution-free GEE. One rare exception is a score statistic initially proposed by Tsiatis for logistic regression (1980) and later extended by Barnhart and Willamson to GEE (1998). Because GEE only provides valid inference under the missing completely at random assumption and missing values arising in most longitudinal studies do not follow such a restricted mechanism, this GEE-based score test has very limited applications in practice. We propose extensions of this goodness-of-fit test to address missing data under the missing at random assumption, a more realistic model that applies to most studies in practice. We examine the performance of the proposed tests using simulated data and demonstrate the utilities of such tests with data from a real study on geriatric depression and associated medical comorbidities.

[1]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[2]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[3]  Thomas A. Louis,et al.  Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function , 2003 .

[4]  B. Linn,et al.  CUMULATIVE ILLNESS RATING SCALE , 1968, Journal of the American Geriatrics Society.

[5]  Hui Zhang,et al.  Modeling longitudinal binomial responses: implications from two dueling paradigms , 2011 .

[6]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[7]  C. Reynolds,et al.  The relationship of medical comorbidity and depression in older, primary care patients. , 2006, Psychosomatics.

[8]  A. Tsiatis A note on a goodness-of-fit test for the logistic regression model , 1980 .

[9]  Scott Evans,et al.  A comparison of goodness of fit tests for the logistic GEE model , 2005, Statistics in medicine.

[10]  Jeanne Kowalski,et al.  Modern Applied U-Statistics , 2007 .

[11]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[12]  M. Lawton,et al.  Assessment of Older People: Self-Maintaining and Instrumental Activities of Daily Living , 1969 .

[13]  Geert Molenberghs,et al.  Likelihood Based Frequentist Inference When Data Are Missing at Random , 1998 .

[14]  W. Pan On the robust variance estimator in generalised estimating equations , 2001 .

[15]  Hui Zhang,et al.  On the Impact of Parametric Assumptions and Robust Alternatives for Longitudinal Data Analysis , 2009, Biometrical journal. Biometrische Zeitschrift.

[16]  G. Fitzmaurice,et al.  A caveat concerning independence estimating equations with multivariate binary data. , 1995, Biometrics.

[17]  H. Barnhart,et al.  Goodness-of-fit tests for GEE modeling with binary responses. , 1998, Biometrics.

[18]  M. Pepe,et al.  A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data , 1994 .

[19]  Janet B W Williams,et al.  A structured interview guide for the Hamilton Depression Rating Scale. , 1988, Archives of general psychiatry.

[20]  E. Demidenko,et al.  Mixed Models: Theory and Applications (Wiley Series in Probability and Statistics) , 2004 .