Data analytic methods for matched case-control studies.

The recent introduction of complex multivariate statistical models in matched case-control studies is a mixed blessing. Their use can lead to a better understanding of the way in which many variables contribute to the risk of disease. On the other hand, these powerful methods can obscure salient features in the data that might have been detected by other, less sophisticated methods. This shortcoming is due to a lack of support methodology for the routine use of these models. Satisfactory computation of estimated relative risks and their standard errors is not sufficient justification for the fitted model. Goodness of fit must be examined if inferences are to be trusted. This paper is concerned with the analysis of matched case-control studies with logistic models. Analogies of these models to linear regression models are emphasized. In particular, basic concepts such as analysis of variance, multiple correlation coefficient, one-degree-of-freedom tests, and residual analysis are discussed. The fairly new field of regression diagnostics is also introduced. All procedures are illustrated on a study of bladder cancer in males.