On outcome-dependent sampling designs for longitudinal binary response data with time-varying covariates.

A typical longitudinal study prospectively collects both repeated measures of a health status outcome as well as covariates that are used either as the primary predictor of interest or as important adjustment factors. In many situations, all covariates are measured on the entire study cohort. However, in some scenarios the primary covariates are time dependent yet may be ascertained retrospectively after completion of the study. One common example would be covariate measurements based on stored biological specimens such as blood plasma. While authors have previously proposed generalizations of the standard case-control design in which the clustered outcome measurements are used to selectively ascertain covariates (Neuhaus and Jewell, 1990) and therefore provide resource efficient collection of information, these designs do not appear to be commonly used. One potential barrier to the use of longitudinal outcome-dependent sampling designs would be the lack of a flexible class of likelihood-based analysis methods. With the relatively recent development of flexible and practical methods such as generalized linear mixed models (Breslow and Clayton, 1993) and marginalized models for categorical longitudinal data (see Heagerty and Zeger, 2000, for an overview), the class of likelihood-based methods is now sufficiently well developed to capture the major forms of longitudinal correlation found in biomedical repeated measures data. Therefore, the goal of this manuscript is to promote the consideration of outcome-dependent longitudinal sampling designs and to both outline and evaluate the basic conditional likelihood analysis allowing for valid statistical inference.

[1]  L Sheppard,et al.  Effects of ambient air pollution on symptoms of asthma in Seattle-area children enrolled in the CAMP study. , 2000, Environmental health perspectives.

[2]  P. Heagerty Marginally Specified Logistic‐Normal Models for Longitudinal Binary Data , 1999, Biometrics.

[3]  M. Lesperance,et al.  Estimation efficiency in a binary mixed-effects model setting , 1996 .

[4]  M Palta,et al.  Analysis of longitudinal data with unmeasured confounders. , 1991, Biometrics.

[5]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[6]  Michelle L. Bell,et al.  A Meta-Analysis of Time-Series Studies of Ozone and Mortality With Comparison to the National Morbidity, Mortality, and Air Pollution Study , 2005, Epidemiology.

[7]  Patrick J Heagerty,et al.  Marginalized Transition Models and Likelihood Inference for Longitudinal Categorical Data , 2002, Biometrics.

[8]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[9]  Robert A. Wood,et al.  ASSOCIATION OF LOW-LEVEL OZONE AND FINE PARTICLES WITH RESPIRATORY SYMPTOMS IN CHILDREN WITH ASTHMA , 2004, Pediatrics.

[10]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[11]  B. Leroux,et al.  Efficiency of regression estimates for clustered data. , 1996, Biometrics.

[12]  S J London,et al.  A study of twelve Southern California communities with differing levels and types of air pollution. I. Prevalence of respiratory morbidity. , 1999, American journal of respiratory and critical care medicine.

[13]  Lianne Sheppard,et al.  Insights on bias and information in group-level studies. , 2003, Biostatistics.

[14]  J. Ware,et al.  Random-effects models for serial observations with binary response. , 1984, Biometrics.

[15]  P. Heagerty,et al.  Regression analysis of longitudinal binary data with time-dependent environmental covariates: bias and efficiency. , 2005, Biostatistics.

[16]  J M Neuhaus,et al.  The effect of retrospective sampling on binary regression models for clustered data. , 1990, Biometrics.

[17]  Patrick J Heagerty,et al.  Marginalized Models for Moderate to Long Series of Longitudinal Binary Response Data , 2007, Biometrics.

[18]  Louise Ryan,et al.  A Case‐Cohort Design for Assessing Covariate Effects in Longitudinal Studies , 2005, Biometrics.

[19]  J. Anderson Separate sample logistic discrimination , 1972 .

[20]  J. Kalbfleisch,et al.  Between- and within-cluster covariate effects in the analysis of clustered data. , 1998, Biometrics.

[21]  D. Dockery,et al.  The effect of air pollution on inner-city children with asthma , 2002, European Respiratory Journal.

[22]  H. Origasa Longitudinal Data Analysis Using Linear Models , 1988 .

[23]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[24]  Scott L. Zeger,et al.  Marginalized Multilevel Models and Likelihood Inference , 2000 .

[25]  W S Linn,et al.  A study of twelve Southern California communities with differing levels and types of air pollution. II. Effects on pulmonary function. , 1999, American journal of respiratory and critical care medicine.

[26]  Jianwen Cai,et al.  On case-control sampling of clustered data , 1997 .

[27]  A. Azzalini Logistic regression for autocorrelated data with application to repeated measures , 1994 .

[28]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[29]  G. Fitzmaurice,et al.  A caveat concerning independence estimating equations with multivariate binary data. , 1995, Biometrics.