Common Limitations of Image Processing Metrics

While the importance of automatic image analysis is increasing at an enormous pace, recent meta-research revealed major flaws with respect to algorithm validation. Specifically, performance metrics are key for objective, transparent and comparative performance assessment, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. A common mission of several international initiatives is therefore to provide researchers with guidelines and tools to choose the performance metrics in a problem-aware manner. This dynamically updated document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts.

[1]  Spyridon Bakas,et al.  Are we using appropriate segmentation metrics? Identifying correlates of human expert perception for CNN training beyond rolling the DICE coefficient , 2021, ArXiv.

[2]  Brent van der Heyden,et al.  Evaluation of measures for assessing time-saving of automatic organ-at-risk segmentation in radiotherapy , 2019, Physics and imaging in radiation oncology.

[3]  Paul F. Jäger Challenges and Opportunities of End-to-End Learning in Medical Image Classification , 2020 .

[4]  Paul Aljabar,et al.  Comparative evaluation of autocontouring in clinical practice: A practical method using the Turing test , 2018, Medical physics.

[5]  Lena Maier-Hein,et al.  How to Exploit Weaknesses in Biomedical Challenge Design and Organization , 2018, MICCAI.

[6]  L. Joskowicz,et al.  Inter-observer variability of manual contour delineation of structures in CT , 2018, European Radiology.

[7]  Aaron Carass,et al.  Why rankings of biomedical image analysis competitions should be interpreted with care , 2018, Nature Communications.

[8]  Irina Voiculescu,et al.  Family of boundary overlap metrics for the evaluation of medical image segmentation , 2018, Journal of medical imaging.

[9]  Lena Maier-Hein,et al.  The HCI Stereo Metrics: Geometry-Aware Performance Analysis of Stereo Algorithms , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Allan Hanbury,et al.  Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool , 2015, BMC Medical Imaging.

[11]  Allan Hanbury,et al.  A formal method for selecting evaluation metrics for image segmentation , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[12]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ben Glocker,et al.  Discriminative Segmentation-Based Evaluation Through Shape Dissimilarity , 2012, IEEE Transactions on Medical Imaging.

[14]  Fernando Pereira,et al.  Video Object Relevance Metrics for Overall Segmentation Quality Evaluation , 2006, EURASIP J. Adv. Signal Process..

[15]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Bernice B Brown,et al.  DELPHI PROCESS: A METHODOLOGY USED FOR THE ELICITATION OF OPINIONS OF EXPERTS , 1968 .

[17]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[18]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .