Validation of the medical expert system PNEUMON-IA.

The present study validates the expert system PNEUMON-IA. The aim of PNEUMON-IA is assessing the etiology of community-acquired pneumonias from clinical, radiological, and laboratory data obtained at the onset of the disease. Validation was performed using data from medical records of 76 patients with proven clinical diagnosis of pneumonia. The etiological diagnoses provided by PNEUMON-IA were compared to those established by five specialists unrelated to the development of the expert system. For each etiological possibility, both PNEUMON-IA and the experts provided a causal possibility, expressed as a linguistic label (i.e., "almost impossible"). Linguistic labels were then converted to numeric values. In the majority of cases, an etiological diagnosis was unavailable to be used as a gold standard. To overcome this limitation, distances between arrays of etiological possibilities given by specialists and by PNEUMON-IA were considered as an agreement measure between diagnoses. Cluster analysis based on those distances was used to classify PNEUMON-IA among experts. Results showed the same differences between specialists and PNEUMON-IA as among the specialists themselves. The method used to validate PNEUMON-IA could prove useful to assess the performance of expert systems in fields where no gold standard is available.

[1]  Lluis Godo,et al.  MILORD: The architecture and the management of linguistically expressed uncertainty , 1989, Int. J. Intell. Syst..

[2]  M Stefanelli,et al.  A performance evaluation of the expert system ANEMIA. , 1988, Computers and biomedical research, an international journal.

[3]  J. Mezzich,et al.  PROPOV-K: a FORTRAN program for computing a kappa coefficient using a proportional overlap procedure. , 1989, Computers and biomedical research, an international journal.

[4]  A. Blainey,et al.  Causes of pneumonia presenting to a district general hospital. , 1981, Thorax.

[5]  P D Sampson,et al.  Measuring interrater reliability among multiple raters: an example of methods for nominal data. , 1990, Statistics in medicine.

[6]  G C Sutton,et al.  Computer‐aided diagnosis: A review , 1989, The British journal of surgery.

[7]  P L Miller,et al.  The evaluation of artificial intelligence systems in medicine. , 1985, Computer methods and programs in biomedicine.

[8]  J. P. Kilbourn Haemophilus influenzae in cystic fibrosis. , 1969, The Lancet.

[9]  J. Reggia,et al.  Computer-aided assessment of transient ischemic attacks. A clinical evaluation. , 1984, Archives of neurology.

[10]  Lluís Godo,et al.  Managing Linguistically Expressed Uncertainty in MILORD Application to Medical Diagnosis , 1988 .

[11]  H. Kraemer,et al.  Extension of the kappa coefficient. , 1980, Biometrics.

[12]  Bruce G. Buchanan,et al.  The MYCIN Experiments of the Stanford Heuristic Programming Project , 1985 .

[13]  J. Macfarlane,et al.  HOSPITAL STUDY OF ADULT COMMUNITY-ACQUIRED PNEUMONIA , 1982, The Lancet.

[14]  GrahamC. Sutton,et al.  HOW ACCURATE IS COMPUTER-AIDED DIAGNOSIS? , 1989, The Lancet.

[15]  T. Marrie,et al.  Pneumonia associated with the TWAR strain of Chlamydia. , 1987, Annals of internal medicine.

[16]  T Chard Human versus machine: a comparison of a computer 'expert system' with human experts in the diagnosis of vaginal discharge. , 1987, International journal of bio-medical computing.