论文信息 - New approaches to measuring the performance of programs that generate differential diagnoses using ROC curves and other metrics

New approaches to measuring the performance of programs that generate differential diagnoses using ROC curves and other metrics

INTRODUCTION Evaluation of computer programs which generate multiple diagnoses can be hampered by a lack of effective, well recognized performance metrics. We have developed a method to calculate mean sensitivity and specificity for multiple diagnoses and generate ROC curves. METHODS Data came from a clinical evaluation of the Heart Disease Program (HDP). Sensitivity, specificity, positive and negative predictive value (PPV, NPV) were calculated for each diagnosis type in the study. A weighted mean of overall sensitivity and specificity was derived and used to create an ROC curve. Alternative metrics Comprehensiveness and Relevance were calculated for each case and compared to the other measures. RESULTS Weighted mean sensitivity closely matched Comprehensiveness and mean PPV matched Relevance. Plotting the Physician's sensitivity and specificity on the ROC curve showed that their discrimination was similar to the HDP but sensitivity was significantly lower. CONCLUSIONS These metrics give a clear picture of a program's diagnostic performance and allow straightforward comparison between different programs and different studies.

William J. Long | Hamish S. F. Fraser | Shapur Naimi

[1] Timothy M. Franz,et al. Enhancement of clinicians' diagnostic reasoning by computer-based consultation: a multisite study of 2 systems. , 1999, JAMA.

[2] H J Bernelot Moens. Validation of the AI/RHEUM knowledge base with data from consecutive rheumatological outpatients. , 1992, Methods of information in medicine.

[3] J. Hanley,et al. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[4] William J. Long,et al. Differential diagnoses of the heart disease program have better sensitivity than resident physicians , 1998, AMIA.

[5] H. J. Bernelot Moens. Validation of the AI/RHEUM knowledge base with data from consecutive rheumatological outpatients. , 1992 .

[6] E S Berner,et al. Relationships among performance scores of four diagnostic decision support systems. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[7] J. Hanley,et al. The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[8] A. L. Baker,et al. Performance of four computer-based diagnostic systems. , 1994, The New England journal of medicine.

[9] Charles P. Friedman,et al. Evaluation Methods in Medical Informatics , 1997, Computers and Medicine.

[10] W J Long,et al. Development of a knowledge base for diagnostic reasoning in cardiology. , 1992, Computers and biomedical research, an international journal.