Evaluation of uterine cervix segmentations using ground truth from multiple experts

This work is focused on the generation and utilization of a reliable ground truth (GT) segmentation for a large medical repository of digital cervicographic images (cervigrams) collected by the National Cancer Institute (NCI). NCI invited twenty experts to manually segment a set of 939 cervigrams into regions of medical and anatomical interest. Based on this unique data, the objectives of the current work are to: (1) Automatically generate a multi-expert GT segmentation map; (2) Use the GT map to automatically assess the complexity of a given segmentation task; (3) Use the GT map to evaluate the performance of an automated segmentation algorithm. The multi-expert GT map is generated via the STAPLE (Simultaneous Truth and Performance Level Estimation) algorithm, which is a well-known method to generate a GT segmentation from multiple observations. A new measure of segmentation complexity, which relies on the inter-observer variability within the GT map, is defined. This measure is used to identify images that were found difficult to segment by the experts and to compare the complexity of different segmentation tasks. An accuracy measure, which evaluates the performance of automated segmentation algorithms is presented. Two algorithms for cervix boundary detection are compared using the proposed accuracy measure. The measure is shown to reflect the actual segmentation quality achieved by the algorithms. The methods and conclusions presented in this work are general and can be applied to different images and segmentation tasks. Here they are applied to the cervigram database including a thorough analysis of the available data.

[1]  Mark Schiffman,et al.  ASCUS-LSIL Triage Study , 2000, Acta Cytologica.

[2]  Ross T. Whitaker,et al.  GIST: an interactive, GPU-based level set segmentation tool for 3D medical images , 2004, Medical Image Anal..

[3]  Aaron Fenster,et al.  Testing and optimization of a semiautomatic prostate boundary segmentation algorithm using virtual operators. , 2003, Medical physics.

[4]  Yongmin Kim,et al.  A methodology for evaluation of boundary detection algorithms on medical images , 1997, IEEE Transactions on Medical Imaging.

[5]  Torsten Rohlfing,et al.  Extraction and Application of Expert Priors to Combine Multiple Segmentations of Human Brain Tissue , 2003, MICCAI.

[6]  Stephen T. C. Wong,et al.  Brain tissue segmentation based on DWI/DTI data , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[7]  Martha Elizabeth Shenton,et al.  Two Methods for Validating Brain Tissue Classifiers , 2005, MICCAI.

[8]  M. Schiffman,et al.  HPV testing and visual inspection for cervical cancer screening in resource‐poor regions , 2003, International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics.

[9]  Jayaram K. Udupa,et al.  A framework for evaluating image segmentation algorithms , 2006, Comput. Medical Imaging Graph..

[10]  K. Zou,et al.  Three validation metrics for automated probabilistic image segmentation of brain tumours , 2004, Statistics in medicine.

[11]  Joachim Denzler,et al.  Adaptive Performance-Based Classifier Combination for Generic Object Recognition , 2005 .

[12]  Mark Schiffman,et al.  Visual appearance of the uterine cervix: correlation with human papillomavirus detection and type. , 2007, American journal of obstetrics and gynecology.

[13]  Shiri Gordon,et al.  Content analysis of uterine cervix images: initial steps toward content based indexing and retrieval of cervigrams , 2006, SPIE Medical Imaging.

[14]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[15]  Guido Gerig,et al.  Valmet: A New Validation Tool for Assessing and Improving 3D Object Segmentation , 2001, MICCAI.

[16]  Meritxell Bach Cuadra,et al.  Comparison and validation of tissue modelization and statistical classification methods in T1-weighted MR brain images , 2005, IEEE Transactions on Medical Imaging.

[17]  Julien Jomier,et al.  Comparison of Vessel Segmentations Using STAPLE , 2005, MICCAI.

[18]  A. Fenster,et al.  Evaluation of Segmentation algorithms for Medical Imaging , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  William M. Wells,et al.  Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , 2004, IEEE Transactions on Medical Imaging.

[21]  R Kikinis,et al.  Automatic identification of gray matter structures from MRI to improve the segmentation of white matter lesions. , 1995, Journal of image guided surgery.

[22]  Sameer Antani,et al.  Digital Tools for Collecting Data from Cervigrams for Research and Training in Colposcopy , 2006, Journal of lower genital tract disease.

[23]  G. W. Williams,et al.  Comparing the joint agreement of several raters with another rater. , 1976, Biometrics.

[24]  Guido Gerig,et al.  Unbiased diffeomorphic atlas construction for computational anatomy , 2004, NeuroImage.

[25]  Shelly Lotenberg,et al.  Automatic evaluation of uterine cervix segmentations , 2007, SPIE Medical Imaging.

[26]  Shiri Gordon,et al.  Automatic landmark detection in uterine cervix images for indexing in a content-retrieval system , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[27]  William M. Wells,et al.  Validation of image segmentation by estimating rater bias and variance , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.