Logistic regression when the outcome is measured with uncertainty.

In epidemiologic research, logistic regression is often used to estimate the odds of some outcome of interest as a function of predictors. However, in some datasets, the outcome of interest is measured with imperfect sensitivity and specificity. It is well known that the misclassification induced by such an imperfect diagnostic test will lead to biased estimates of the odds ratios and their variances. In this paper, the authors show that when the sensitivity and specificity of a diagnostic test are known, it is straightforward to incorporate this information into the fitting of logistic regression models. An EM algorithm that produces unbiased estimates of the odds ratios and their variances is described. The resulting odds ratio estimates tend to be farther from the null but have greater variance than estimates found by ignoring the imperfections of the test. The method can be extended to the situation where the sensitivity and specificity differ for different study subjects, i.e., nondifferential misclassification. The method is useful even when the sensitivity and specificity are not known, as a way to see the degree to which various assumptions about sensitivity and specificity affect one's estimates. The method can also be used to estimate sensitivity and specificity under certain assumptions or when a validation subsample is available. Several examples are provided to compare the results of this method with those obtained by standard logistic regression. A SAS macro that implements the method is available on the World Wide Web at http:@som1.ab.umd.edu/Epidemiology/software.h tml.

[1]  H Checkoway,et al.  Bias due to misclassification in the estimation of relative risk. , 1977, American journal of epidemiology.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Yosef Hochberg,et al.  On the Use of Double Sampling Schemes in Analyzing Categorical Data with Misclassification Errors , 1977 .

[4]  T. Chen,et al.  Log-Linear Models for Categorical Data with Misclassification and Double Sampling , 1979 .

[5]  N. Breslow,et al.  Statistical methods in cancer research. Vol. 1. The analysis of case-control studies. , 1981 .

[6]  P A Lachenbruch,et al.  Effects of misclassifications on statistical inferences in epidemiology. , 1980, American journal of epidemiology.

[7]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[8]  James J Schlesselman Case-Control Studies: Design, Conduct, Analysis , 1982 .

[9]  J. Hebel,et al.  A clinical trial of change in maternal smoking and its effect on birth weight. , 1984, JAMA.

[10]  S L Hui,et al.  A general approach to analyzing epidemiologic data that contain misclassification errors. , 1987, Biometrics.

[11]  L. Magder,et al.  Factors related to genital Chlamydia trachomatis and its diagnosis by culture in a sexually transmitted disease clinic. , 1988, American journal of epidemiology.

[12]  R. H. Jones,et al.  Beyond sensitivity, specificity and statistical independence. , 1988, Statistics in medicine.

[13]  T T Chen A review of methods for misclassified categorical data in epidemiology. , 1989, Statistics in medicine.

[14]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. , 1990, American journal of epidemiology.

[15]  G. Guyatt,et al.  Users' Guides to the Medical Literature: III. How to Use an Article About a Diagnostic Test: B. What Are the Results and Will They Help Me In Caring for My Patients? , 1994 .