Assessing predictive accuracy: how to compare Brier scores.

Several investigators have used the Brier index to measure the predictive accuracy of a set of medical judgments; the Brier scores of different raters who have evaluated the same patients provides a measure of relative accuracy. However, such comparisons may be difficult to interpret because of the lack of a statistical test for differentiating between two Brier scores. To demonstrate a method for addressing this issue we analyzed the judgments of five medical students, each of whom independently evaluated the same 25 patients with recurrent chest pain. Using the method we determined that two of the students gave judgments that were incompatible with the actual observed outcomes (p less than 0.05); of the three remaining students we detected a significant difference between two (p less than 0.05). These results differed from receiver operating characteristic curve area analysis, another technique used to evaluate predictive accuracy. We suggest that the proposed method can provide a useful tool for investigators using the Brier index to compare how well clinicians express uncertainty using probability judgments.

[1]  H C Sox,et al.  Probability theory in the use of diagnostic tests. An introduction to critical study of the literature. , 1986, Annals of internal medicine.

[2]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[3]  B. Brundage,et al.  Diagnostic accuracy of cardiologists compared with probability calculations using Bayes' rule. , 1982, The American journal of cardiology.

[4]  Spiegelhalter Dj Statistical methodology for evaluating gastrointestinal symptoms. , 1985 .

[5]  H. Raiffa,et al.  Judgment under uncertainty: A progress report on the training of probability assessors , 1982 .

[6]  D J Spiegelhalter,et al.  Probabilistic prediction in patient management and clinical trials. , 1986, Statistics in medicine.

[7]  A S Detsky,et al.  What's wrong with decision analysis? Can the left brain influence the right? , 1987, Journal of chronic diseases.

[8]  R. L. Winkler,et al.  Are two (inexperienced) heads better than one (experienced) head? Averaging house officers' prognostic judgments for critically ill patients. , 1990, Archives of internal medicine.

[9]  S G Pauker,et al.  Pathology and probabilities: a new approach to interpreting and reporting biopsies. , 1981, The New England journal of medicine.

[10]  R M Centor,et al.  Eualuating Physicians' Probabilistic Judgments , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[11]  E F Cook,et al.  Impact of a cardiology data bank on physicians' prognostic estimates. Evidence that cardiology fellows change their estimates to become as accurate as the faculty. , 1981, Archives of internal medicine.

[12]  Alan R. Shapiro,et al.  The Evaluation of Clinical Predictions: A Method and Initial Application , 1977 .

[13]  A. Dannenberg,et al.  Enhancement of Clinical Predictive Ability by Computer Consultation) , 1979, Methods of Information in Medicine.

[14]  A. H. Murphy,et al.  Probability Forecasting in Meteorology , 1984 .

[15]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[16]  A. H. Murphy,et al.  “Good” Probability Assessors , 1968 .

[17]  D H Hickam,et al.  Teaching medical students to estimate probability of coronary artery disease , 1987, Journal of general internal medicine.

[18]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[19]  D. Eddy Judgment under uncertainty: Probabilistic reasoning in clinical medicine: Problems and opportunities , 1982 .

[20]  D. McClish,et al.  How Well Can Physicians Estimate Mortality in a Medical Intensive Care Unit? , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[21]  F. Harrell,et al.  Predicting outcome in coronary disease. Statistical models versus expert clinicians. , 1986, The American journal of medicine.

[22]  J P Kassirer,et al.  Our stubborn quest for diagnostic certainty. A cause of excessive testing. , 1989, The New England journal of medicine.

[23]  B. Fischhoff,et al.  Calibration of probabilities: the state of the art to 1980 , 1982 .

[24]  J G Dolan,et al.  An Eualuation of Clinicians' Subjective Prior Probability Estimates , 1986, Medical decision making : an international journal of the Society for Medical Decision Making.

[25]  Roy M. Poses,et al.  What Difference Do Two Days Make? The Inertia of Physicians' Sequential Prognostic Judgments for Critically III Patients , 1990, Medical decision making : an international journal of the Society for Medical Decision Making.

[26]  G. Brier,et al.  External correspondence: Decompositions of the mean probability score , 1982 .

[27]  David R. Cox The analysis of binary data , 1970 .

[28]  J. Kassirer,et al.  The threshold approach to clinical decision making. , 1980, The New England journal of medicine.