Logistic regression of family data from case-control studies

SUMMARY Multivariate regression models are applied to binary disease data in families identified from case-control studies. Attention is restricted to 'marginal' or reproducible models, i.e. those whose parameters have the same interpretations in the marginal distributions for all subsets of a family, with a logistic specification of each individual's marginal disease probability. For such models, it is shown that the case-control family data can be analysed as if they were obtained from a prospective study, with the baseline disease probabilities of case and control probands differing from that of their relatives. This result extends that of Anderson (1972) and Prentice & Pyke (1979) for the probands' data to include disease outcomes and covariates for their families. It contrasts with inconsistent estimates of parameters in nonreproducible models that result when the case-control sampling design is ignored (Tosteson, Rosner & Redline, 1991). The contrast underscores the need to check the plausibility of the reproducibility assumption, which requires that the covariates be independent of any unmeasured factors responsible for the correlation of familial disease occurrence, before analysing case-control data prospectively.