The effect of retrospective sampling on binary regression models for clustered data.

Recently a great deal of attention has been given to binary regression models for clustered or correlated observations. The data of interest are of the form of a binary dependent or response variable, together with independent variables X1,...., Xk, where sets of observations are grouped together into clusters. A number of models and methods of analysis have been suggested to study such data. Many of these are extensions in some way of the familiar logistic regression model for binary data that are not grouped (i.e., each cluster is of size 1). In general, the analyses of these clustered data models proceed by assuming that the observed clusters are a simple random sample of clusters selected from a population of clusters. In this paper, we consider the application of these procedures to the case where the clusters are selected randomly in a manner that depends on the pattern of responses in the cluster. For example, we show that ignoring the retrospective nature of the sample design, by fitting standard logistic regression models for clustered binary data, may result in misleading estimates of the effects of covariates and the precision of estimated regression coefficients.

[1]  D. Binder On the variances of asymptotically normal estimators from complex surveys , 1983 .

[2]  B Rosner,et al.  Multivariate methods in ophthalmology with application to other paired-data situations. , 1984, Biometrics.

[3]  Murray Aitkin,et al.  Variance Component Models with Binary Response: Interviewer Variability , 1985 .

[4]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[5]  Martin Crowder,et al.  Beta-binomial Anova for Proportions , 1978 .

[6]  D. Holt,et al.  Regression Analysis of Data from Complex Surveys , 1980 .

[7]  Vijayan N. Nair,et al.  Estimation of reliability in field-performance studies , 1988 .

[8]  Nicholas P. Jewell,et al.  Some Comments on Rosner's Multiple Logistic Model for Clustered Data , 1990 .

[9]  N. Mantel Synthetic retrospective studies and related topics. , 1973, Biometrics.

[10]  N. E. Breslow Statistical Methods in Cancer Research , 1986 .

[11]  Ross L. Prentice,et al.  Binary Regression Using an Extended Beta-Binomial Distribution, with Discussion of Correlation Induced by Covariate Measurement Errors , 1986 .

[12]  Williams Da,et al.  The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. , 1975 .

[13]  J. Ware,et al.  Random-effects models for serial observations with binary response. , 1984, Biometrics.

[14]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[15]  Allan Donner,et al.  Estimation Under the Correlated Logistic Model , 1987 .

[16]  D. A. Williams,et al.  Extra‐Binomial Variation in Logistic Linear Models , 1982 .

[17]  J. Anderson Separate sample logistic discrimination , 1972 .