Empirical evaluation of a hybrid intelligent monitoring system using different measures of effectiveness

The validation of a software product is a fundamental part of its development, and focuses on an analysis of whether the software correctly resolves the problems it was designed to tackle. Traditional approaches to validation are based on a comparison of results with what is called a gold standard. Nevertheless, in certain domains, it is not always easy or even possible to establish such a standard. This is the case of intelligent systems that endeavour to simulate or emulate a model of expert behaviour. This article describes the validation of the intelligent system computer-aided foetal evaluator (CAFE), developed for intelligent monitoring of the antenatal condition based on data from the non-stress test (NST), and how this validation was accomplished through a methodology designed to resolve the problem of the validation of intelligent systems. System performance was compared to that of three obstetricians using 3450 min of cardiotocographic (CTG) records corresponding to 53 different patients. From these records different parameters were extracted and interpreted, and thus, the validation was carried out on a parameter-by-parameter basis using measurement techniques such as percentage agreement, the Kappa statistic or cluster analysis. Results showed that the system's agreement with the experts is, in general, similar to agreement between the experts themselves which, in turn, permits our system to be considered at least as skillful as our experts. Throughout our article, the results obtained are commented on with a view to demonstrating how the utilisation of different measures of the level of agreement existing between system and experts can assist not only in assessing the aptness of a system, but also in highlighting its weaknesses. This kind of assessment means that the system can be fine-tuned repeatedly to the point where the expected results are obtained.

[1]  A Hasman,et al.  Interobserver variation in the assessment of fetal heart rate recordings. , 1993, European journal of obstetrics, gynecology, and reproductive biology.

[2]  Brian Everitt,et al.  Cluster analysis , 1974 .

[3]  Reed M. Gardner,et al.  White Paper: Designing Medical Informatics Research and Library-Resource Projects to Increase What Is Learned , 1994, J. Am. Medical Informatics Assoc..

[4]  Jane Grimson,et al.  A methodology for evaluation of knowledge-based systems in medicine , 1994, Artif. Intell. Medicine.

[5]  Amparo Alonso-Betanzos,et al.  Detection of Fetal Heart Rate Accelerations and Decelerations Using Artificial Neural Networks , 1998, NC.

[6]  G. W. Williams,et al.  Comparing the joint agreement of several raters with another rater. , 1976, Biometrics.

[7]  Christopher J. R. Green,et al.  Verification and validation of expert systems , 1991 .

[8]  J. Wrobel,et al.  Fetal monitoring with automated analysis of cardiotocogram: the Kompor system , 1993, Proceedings of the 15th Annual International Conference of the IEEE Engineering in Medicine and Biology Societ.

[9]  Bertha Guijarro-Berdiñas CAFE: un sistema con arquitectura híbrida para la monitorización inteligente del estado antenatal , 1999 .

[10]  P. Nielsen,et al.  Intra‐ and inter‐observer variability in the assessment of intrapartum cardiotocograms , 1987 .

[11]  Giancarlo Mauri,et al.  Evaluating Performance and Quality of Knowledge-Based Systems: Foundation and Methodology , 1993, IEEE Trans. Knowl. Data Eng..

[12]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[13]  Tin A. Nguyen,et al.  Knowledge base verification , 1987 .

[14]  K. Greene,et al.  ELECTRONIC FETAL HEART RATE MONITORING : RESEARCH GUIDELINES FOR INTERPRETATION. AUTHOR'S REPLY , 1997 .

[15]  Amparo Alonso-Betanzos,et al.  Symbolic, Neural and Neuro-fuzzy Approaches to Pattern Recognition in Cardiotocograms , 2002, Advances in Computational Intelligence and Learning.

[16]  Amparo Alonso-Betanzos,et al.  A Neural Network Approach to the Classification of Decelerative Cardiotocographic Patterns in the CAFE Project , 1998, NC.

[17]  G S Dawes,et al.  System 8000: Computerized antenatal FHR analysis , 1991, Journal of perinatal medicine.

[18]  P L Miller,et al.  The evaluation of clinical decision support systems: what is necessary versus what is interesting. , 1990, Medical informatics = Medecine et informatique.

[19]  Amparo Alonso-Betanzos,et al.  A Hybrid Intelligent System for the Preprocessing of Fetal Heart Rate Signals in Antenatal Testing , 1997, IWANN.

[20]  J. Bernardes,et al.  Objective computerized fetal heart rate analysis , 1998, International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics.

[21]  Eduardo Mosqueira-Rey,et al.  Validation of intelligent systems: a critical study and a tool , 2000 .

[22]  Randolph A. Miller,et al.  Evaluating Evaluations of Medical Diagnostic Systems , 1996, J. Am. Medical Informatics Assoc..

[23]  G S Dawes,et al.  Baseline in human fetal heart‐rate records , 1982, British journal of obstetrics and gynaecology.

[24]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[25]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[26]  Alun Preece,et al.  Verifying expert systems: A logical framework and a practical tool , 1992 .

[27]  J. M. Swartjes,et al.  Computer analysis of antepartum fetal heart rate: 1. Baseline determination. , 1990, International journal of bio-medical computing.

[28]  J. M. Swartjes,et al.  Computer analysis of antepartum fetal heart rate: 2. Detection of accelerations and decelerations. , 1990, International journal of bio-medical computing.

[29]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[30]  Amparo Alonso-Betanzos,et al.  The NST-EXPERT project: the need to evolve , 1995, Artif. Intell. Medicine.

[31]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[32]  Georg Dorffner,et al.  Neural networks for recognizing patterns in cardiotocograms , 1998, Artif. Intell. Medicine.