The need to develop guidelines for the evaluation of medical image processing procedures

Evaluations of procedures in medical image processing are notoriously difficult and often unconvincing. From a detailed bibliographic study, we analyzed the way evaluation studies are conducted and extracted a number of entities common to any evaluation protocol. From this analysis, we propose here a generic evaluation model (GEM). The GEM includes the notion of hierarchical evaluation, identifies the components which have always to be defined when designing an evaluation protocol and shows the relationships that exist between these components. By suggesting rules applying to the different components of the GEM, we also show how this model can be used as a first step towards guidelines for evaluation.

[1]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[2]  J M Bland,et al.  Statistical methods for assessing agreement between two methods of clinical measurement , 1986 .

[3]  H H Barrett,et al.  Hotelling trace criterion and its correlation with human-observer performance. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[4]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[5]  M. Bronskill,et al.  Receiver Operator characteristic (ROC) Analysis without Truth , 1990, Medical decision making : an international journal of the Society for Medical Decision Making.

[6]  Andreas Uhl,et al.  Generalized wavelet decompositions in image compression: arbitrary subbands and parallel algorithms , 1997 .

[7]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[8]  T. K. Narayan,et al.  Evaluation of task-oriented performance of several fully 3D PET reconstruction algorithms. , 1994, Physics in medicine and biology.

[9]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[10]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[11]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[12]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[13]  G. Hounsfield Computerized transverse axial scanning (tomography): Part I. Description of system. 1973. , 1973, The British journal of radiology.

[14]  P N Valenstein,et al.  Evaluating diagnostic tests with imperfect standards. , 1990, American journal of clinical pathology.

[15]  R. F. Wagner,et al.  Unified SNR analysis of medical imaging systems , 1985, Physics in medicine and biology.

[16]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[17]  K. M. Hanson,et al.  Method To Evaluate Image-Recovery Algorithms Based On Task Performance , 1988, Medical Imaging.

[18]  C E Phelps,et al.  Estimating Diagnostic Test Accuracy Using a "Fuzzy Gold Standard" , 1995, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  Dennis G. Fryback,et al.  The Efficacy of Diagnostic Imaging , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[20]  T K Narayan,et al.  A methodology for testing for statistically significant differences between fully 3D PET reconstruction algorithms. , 1994, Physics in medicine and biology.