Receiver operating characteristic analysis for intelligent medical systems-a new approach for finding confidence intervals

Intelligent systems are increasingly being deployed in medicine and healthcare, but there is a need for a robust and objective methodology for evaluating such systems. Potentially, receiver operating characteristic (ROC) analysis could form a basis for the objective evaluation of intelligent medical systems. However, it has several weaknesses when applied to the types of data used to evaluate intelligent medical systems. First, small data sets are often used, which are unsatisfactory with existing methods. Second, many existing ROC methods use parametric assumptions which may not always be valid for the test cases selected. Third, system evaluations are often more concerned with particular, clinically meaningful, points on the curve, rather than on global indexes such as the more commonly used area under the curve. A novel, robust and accurate method is proposed, derived from first principles, which calculates the probability density function (pdf) for each point on a ROC curve for any given sample size. Confidence intervals are produced as contours on the pdf. The theoretical work has been validated by Monte Carlo simulations. It has also been applied to two real-world examples of ROC analysis, taken from the literature (classification of mammograms and differential diagnosis of pancreatic diseases), to investigate the confidence surfaces produced for real cases, and to illustrate how analysis of system performance can be enhanced. We illustrate the impact of sample size on system performance from analysis of ROC pdf's and 95% confidence boundaries. This work establishes an important new method for generating pdf's, and provides an accurate and robust method of producing confidence intervals for ROC curves for the small sample sizes typical of intelligent medical systems. It is conjectured that, potentially, the method could be extended to determine risks associated with the deployment of intelligent medical systems in clinical practice.

[1]  J. W. Huang,et al.  Depth of anesthesia estimation and control. , 1999, IEEE transactions on bio-medical engineering.

[2]  S. Greenhouse,et al.  The evaluation of diagnostic tests. , 1950, Biometrics.

[3]  Jonathan M. Garibaldi,et al.  Application of simulated annealing fuzzy model tuning to umbilical cord acid-base interpretation , 1999, IEEE Trans. Fuzzy Syst..

[4]  N A Obuchowski,et al.  Confidence intervals for the receiver operating characteristic area in studies with small samples. , 1998, Academic radiology.

[5]  E. M. S. J. Van Gennip,et al.  Assessment and evaluation of information technogies in medicine , 1995 .

[6]  K. Adlassnig,et al.  Performance evaluation of medical expert systems using ROC curves. , 1989, Computers and biomedical research, an international journal.

[7]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[8]  A R Henderson,et al.  Assessment of clinical enzyme methodology: a probabilistic approach. , 1997, Clinica chimica acta; international journal of clinical chemistry.

[9]  S Andreassen,et al.  Evaluation of the diagnostic performance of the expert EMG assistant MUNIN. , 1996, Electroencephalography and clinical neurophysiology.

[10]  L B Lusted,et al.  Radiographic applications of signal detection theory. , 1972, Radiology.

[11]  L Ohno-Machado,et al.  A comparison of Cox proportional hazards and artificial neural network models for medical prognosis , 1997, Comput. Biol. Medicine.

[12]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[13]  Jane Grimson,et al.  A methodology for evaluation of knowledge-based systems in medicine , 1994, Artif. Intell. Medicine.

[14]  F Van Lente,et al.  Diagnostic accuracy of pancreatic enzymes evaluated by use of multivariate data analysis. , 1993, Clinical chemistry.

[15]  R D Keith,et al.  A multicentre comparative study of 17 experts and an intelligent computer system for managing labour using the cardiotocogram , 1995, British journal of obstetrics and gynaecology.

[16]  Pierre L'Ecuyer,et al.  Efficient and portable combined random number generators , 1988, CACM.

[17]  D. Goulis,et al.  Clinical evaluation of the DIABETES expert system for decision support by multiple regimen insulin dose adjustment. , 1996, Computer methods and programs in biomedicine.

[18]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[19]  Heng-Da Cheng,et al.  A novel approach to microcalcification detection using fuzzy logic technique , 1998, IEEE Transactions on Medical Imaging.

[20]  Ronald J. Jaszczak,et al.  ROC evaluation of SPECT myocardial lesion detectability with and without single iteration non-uniform Chang attenuation compensation using an anthropomorphic female phantom , 1998 .

[21]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[22]  R.J. Roy,et al.  Depth of anesthesia estimation and control [using auditory evoked potentials] , 1999, IEEE Transactions on Biomedical Engineering.

[23]  K R Lee,et al.  Use of statistical analysis of cytologic interpretation to determine the causes of interobserver disagreement and in quality improvement , 1997, Cancer.

[24]  J. Hanley,et al.  Statistical Approaches to the Analysis of Receiver Operating Characteristic (ROC) Curves , 1984, Medical decision making : an international journal of the Society for Medical Decision Making.

[25]  J.H.L. Hansen,et al.  A noninvasive technique for detecting hypernasal speech using a nonlinear operator , 1996, IEEE Transactions on Biomedical Engineering.

[26]  R. Hilgers Distribution-Free Confidence Bounds for ROC Curves , 1991, Methods of Information in Medicine.

[27]  H. Schäfer,et al.  Efficient confidence bounds for ROC curves. , 1994, Statistics in medicine.

[28]  K. Zou,et al.  Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. , 1997, Statistics in medicine.

[29]  K Schreiber,et al.  Significant reduction in the rate of false-negative cervical smears with neural network-based technology (PAPNET Testing System). , 1997, Human pathology.

[30]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[31]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.