Performance evaluation of medical expert systems using ROC curves.

This paper presents a performance evaluation of the diagnostic accuracy of the medical expert system CADIAG-2/PANCREAS. The study included 47 clinical cases from a university hospital with 51 diagnosis of pancreatic diseases (four patients had two pancreatic diseases). As gold standard, the histologically or clinically confirmed diagnoses were assumed. Performance was studied along three lines: (a) each case was evaluated twice, first, by restricting patient data to history, physical examination, and basic laboratory tests and, second, by utilizing the complete set of data including also special laboratory tests. US. X ray, CT-scan, ECG, and biopsy, if available: (b) considering CADIAG-2's hypotheses generation, each evaluation series was also carried out twice, first, by testing whether the gold standard was the first diagnosis in the ranked list of hypothesis and, second, whether the gold standard was among the hypotheses: (c) receiver operating characteristic (ROC) curves were determined by varying an internal threshold which determined the extent of CADIAG-2's diagnostic hypotheses generation. The evaluation showed that CADIAG-2's initial list of diagnostic hypotheses, based on patient history, physical examination, and basic laboratory tests usually has already included the gold standard diagnosis and thus an application of CADIAG-2 at a very early stage of the diagnostic process seems achievable. Moreover, it turned out that given the complete set of patient's medical data the gold standard is usually ranked at the first place in the list of hypotheses. except for patients with chronic diseases where only unspecific findings are available. The last test series showed that ROC curves do not only allow optimal adjustment of the expert system's internal ad hoc decision criteria such as thresholds, weights, and scores but also provide a basis for better comparing the performance of different medical expert systems.

[1]  J. Swets The Relative Operating Characteristic in Psychology , 1973, Science.

[2]  R M Centor,et al.  A Visicalc Program for Estimating the Area Under a Receiver Operating Characteristic (ROC) Curve , 1985, Medical decision making : an international journal of the Society for Medical Decision Making.

[3]  A. Komaroff,et al.  The variability and inaccuracy of medical data , 1979, Proceedings of the IEEE.

[4]  R M Centor,et al.  An Evaluation of Methods for Estimating the Area Under the Receiver Operating Characteristic (ROC) Curve , 1985, Medical decision making : an international journal of the Society for Medical Decision Making.

[5]  K. Adlassnig A Fuzzy Logical Model of Computer-Assisted Medical Diagnosis , 1980, Methods of Information in Medicine.

[6]  D. G. Swain Computer aided diagnosis of acute abdominal pain , 1986 .

[7]  K. Adlassnig,et al.  Approach to a hospital-based application of a medical expert system. , 1986, Medical informatics = Medecine et informatique.

[8]  R. Thurmayr,et al.  Computer Aid for the Screening Test of Pancreatic Diseases , 1981 .

[9]  F. T. de Dombal,et al.  Computers and Decision-Making: An Overview for Gastro-Enterologists , 1984 .

[10]  Krisztina Boda,et al.  Diagnostics of pancreatic insufficiency using multivariate statistical and pattern recognition methods , 1984 .

[11]  K. Adlassnig,et al.  Representation and semiautomatic acquisition of medical knowledge in CADIAG-1 and CADIAG-2. , 1986, Computers and biomedical research, an international journal.

[12]  M Fieschi,et al.  Some Reflections on the Evaluation of Expert Systems in Medicine , 1986, Methods of Information in Medicine.

[13]  K P Adlassnig,et al.  [Computer-assisted diagnosis and its application in pancreatic diseases]. , 1984, Acta medica Austriaca.

[14]  P. Deas Notes of a Case of Spontaneous Fracture of the Humerus and Femur, Resulting from Degeneration of the Bones , 1877, British medical journal.

[15]  Mark S. Tuttle,et al.  "Expertness" from Structured Text? RECONSIDER: A Diagnostic Prompting Program , 1983, ANLP.

[16]  Klaus-Peter Adlassnig,et al.  CADIAG-2/PANCREAS: An Artificial Intelligence System Based on Fuzzy Set Theory to Diagnose Pancreatic Diseases , 1984 .

[17]  Klaus-Peter Adlassnig,et al.  Fuzzy Set Theory in Medical Diagnosis , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  R A Miller,et al.  QUICK (QUick Index to Caduceus Knowledge): using the INTERNIST-1/CADUCEUS knowledge base as an electronic textbook of medicine. , 1985, Computers and biomedical research, an international journal.

[19]  F T de Dombal,et al.  Use of receiver operating characteristic (ROC) curves to evaluate computer confidence threshold and clinical performance in the diagnosis appendicitis. , 1978 .

[20]  F T de Dombal,et al.  Use of receiver operating characteristic (ROC) curves to evaluate computer confidence threshold and clinical performance in the diagnosis appendicitis. , 1978, Methods of information in medicine.

[21]  W. England,et al.  An Exponential Model Used for optimal Threshold selection on ROC Curues , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[22]  M. Liang,et al.  The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. , 1988, Arthritis and rheumatism.

[23]  F. T. de Dombal,et al.  Computer-Assisted Diagnosis of Abdominal Pain using “Estimates” Provided by Clinicians , 1972, British medical journal.

[24]  Edward H. Shortliffe,et al.  ONCOCIN: An Expert System for Oncology Protocol Management , 1981, IJCAI.

[25]  Lee B. Lusted,et al.  Introduction to medical decision making , 1968 .

[26]  P. Loy International Classification of Diseases--9th revision. , 1978, Medical record and health care information journal.

[27]  K. Adlassnig,et al.  CADIAG: approaches to computer-assisted medical diagnosis. , 1985, Computers in biology and medicine.

[28]  K.-P. Adlaßnig,et al.  Verarbeitung Natürlichsprachiger Medizinischer Begriffe , 1985 .

[29]  E H Shortliffe,et al.  PUFF: an expert system for interpretation of pulmonary function data. , 1982, Computers and biomedical research, an international journal.

[30]  R A Miller,et al.  INTERNIST-1/CADUCEUS: Problems Facing Expert Consultant Programs , 1984, Methods of Information in Medicine.

[31]  H. E. Pople,et al.  Internist-I, an Experimental Computer-Based Diagnostic Consultant for General Internal Medicine , 1982 .

[32]  Bruce G. Buchanan,et al.  The MYCIN Experiments of the Stanford Heuristic Programming Project , 1985 .

[33]  M S Blois,et al.  Evaluating RECONSIDER. A computer program for diagnostic prompting. , 1985, Journal of medical systems.

[34]  J. Durbec,et al.  Data Screening Methods – Application to Differential Diagnosis in Pancreatic Pathology from Radiological Signs , 1978, Methods of Information in Medicine.

[35]  Jay Liebowitz,et al.  Useful approach for evaluating expert systems , 1986 .

[36]  R A Miller,et al.  INTERNIST-I properties: representing common sense and good medical practice in a computerized medical knowledge base. , 1985, Computers and biomedical research, an international journal.