Self-assessed performance improves statistical fusion of image labels.

PURPOSE Expert manual labeling is the gold standard for image segmentation, but this process is difficult, time-consuming, and prone to inter-individual differences. While fully automated methods have successfully targeted many anatomies, automated methods have not yet been developed for numerous essential structures (e.g., the internal structure of the spinal cord as seen on magnetic resonance imaging). Collaborative labeling is a new paradigm that offers a robust alternative that may realize both the throughput of automation and the guidance of experts. Yet, distributing manual labeling expertise across individuals and sites introduces potential human factors concerns (e.g., training, software usability) and statistical considerations (e.g., fusion of information, assessment of confidence, bias) that must be further explored. During the labeling process, it is simple to ask raters to self-assess the confidence of their labels, but this is rarely done and has not been previously quantitatively studied. Herein, the authors explore the utility of self-assessment in relation to automated assessment of rater performance in the context of statistical fusion. METHODS The authors conducted a study of 66 volumes manually labeled by 75 minimally trained human raters recruited from the university undergraduate population. Raters were given 15 min of training during which they were shown examples of correct segmentation, and the online segmentation tool was demonstrated. The volumes were labeled 2D slice-wise, and the slices were unordered. A self-assessed quality metric was produced by raters for each slice by marking a confidence bar superimposed on the slice. Volumes produced by both voting and statistical fusion algorithms were compared against a set of expert segmentations of the same volumes. RESULTS Labels for 8825 distinct slices were obtained. Simple majority voting resulted in statistically poorer performance than voting weighted by self-assessed performance. Statistical fusion resulted in statistically indistinguishable performance from self-assessed weighted voting. The authors developed a new theoretical basis for using self-assessed performance in the framework of statistical fusion and demonstrated that the combined sources of information (both statistical assessment and self-assessment) yielded statistically significant improvement over the methods considered separately. CONCLUSIONS The authors present the first systematic characterization of self-assessed performance in manual labeling. The authors demonstrate that self-assessment and statistical fusion yield similar, but complementary, benefits for label fusion. Finally, the authors present a new theoretical basis for combining self-assessments with statistical label fusion.

[1]  Jerry L. Prince,et al.  Adaptive fuzzy segmentation of magnetic resonance images , 1999, IEEE Transactions on Medical Imaging.

[2]  Chengwen Chu,et al.  Multi-organ Abdominal CT Segmentation Using Hierarchically Weighted Subject-Specific Atlases , 2012, MICCAI.

[3]  Bennett A. Landman,et al.  Robust Statistical Label Fusion Through Consensus Level, Labeler Accuracy, and Truth Estimation (COLLATE) , 2011, IEEE Transactions on Medical Imaging.

[4]  Ron Kikinis,et al.  Adaptive Template Moderated Brain Tumor Segmentation in MRI , 1999, Bildverarbeitung für die Medizin.

[5]  D. Louis Collins,et al.  Automatic 3‐D model‐based neuroanatomical segmentation , 1995 .

[6]  Torsten Rohlfing,et al.  Performance-based classifier combination in atlas-based image segmentation using expectation-maximization parameter estimation , 2004, IEEE Transactions on Medical Imaging.

[7]  Benoit M. Dawant,et al.  Automatic 3-D segmentation of internal structures of the head in MR images using a combination of similarity and free-form transformations. I. Methodology and validation on normal subjects , 1999, IEEE Transactions on Medical Imaging.

[8]  G. Fein,et al.  Tissue segmentation of the brain in Alzheimer disease. , 1997, AJNR. American journal of neuroradiology.

[9]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[10]  Guido Gerig,et al.  A brain tumor segmentation framework based on outlier detection , 2004, Medical Image Anal..

[11]  R. Gardner,et al.  Using Amazon's Mechanical Turk website to measure accuracy of body size estimation and body dissatisfaction. , 2012, Body image.

[12]  David G. Rand,et al.  The promise of Mechanical Turk: how online labor markets can help theorists run behavioral experiments. , 2012, Journal of theoretical biology.

[13]  Jung-Hyun Kim,et al.  Evaluation of automated and semi-automated skull-stripping algorithms using similarity index and segmentation error , 2003, Comput. Biol. Medicine.

[14]  Simon K. Warfield,et al.  Estimating A Reference Standard Segmentation With Spatially Varying Performance Parameters: Local MAP STAPLE , 2012, IEEE Transactions on Medical Imaging.

[15]  Alan L. Yuille,et al.  Efficient Multilevel Brain Tumor Segmentation With Integrated Bayesian Model Classification , 2008, IEEE Transactions on Medical Imaging.

[16]  Ron Kikinis,et al.  Segmentation of Meningiomas and Low Grade Gliomas in MRI , 1999, MICCAI.

[17]  Geoffrey G. Zhang,et al.  Semi-automated CT segmentation using optic flow and Fourier interpolation techniques , 2006, Comput. Methods Programs Biomed..

[18]  Alan C. Evans,et al.  MRI-PET Correlation in Three Dimensions Using a Volume-of-Interest (VOI) Atlas , 1991, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[19]  Jerry L. Prince,et al.  Robust Statistical Fusion of Image Labels , 2012, IEEE Transactions on Medical Imaging.

[20]  Jerry L. Prince,et al.  Foibles, follies, and fusion: assessment of statistical label fusion techniques for web-based collaborations using minimal training , 2011, Medical Imaging.

[21]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[22]  Bennett A. Landman,et al.  Formulating Spatially Varying Performance in the Statistical Fusion Framework , 2012, IEEE Transactions on Medical Imaging.

[23]  Mert R. Sabuncu,et al.  A Generative Model for Image Segmentation Based on Label Fusion , 2010, IEEE Transactions on Medical Imaging.

[24]  Jayaram K. Udupa,et al.  User-Steered Image Segmentation Paradigms: Live Wire and Live Lane , 1998, Graph. Model. Image Process..

[25]  David Windridge,et al.  A Morphologically Optimal Strategy for Classifier Combination: Multiple Expert Fusion as a Tomographic Process , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Nancy C. Andreasen,et al.  Segmentation techniques for the classification of brain tissue using magnetic resonance imaging , 1992, Psychiatry Research: Neuroimaging.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  W. Eric L. Grimson,et al.  Segmentation of brain tissue from magnetic resonance images , 1995, Medical Image Anal..

[29]  Seth A. Smith,et al.  Robust GM/WM Segmentation of the Spinal Cord with Iterative Non-local Statistical Fusion , 2013, MICCAI.

[30]  Jerry L. Prince,et al.  Foibles, follies, and fusion: Web-based collaboration for medical image labeling , 2012, NeuroImage.

[31]  Guido Gerig,et al.  Automatic brain tumor segmentation by subject specific modification of atlas priors. , 2003, Academic radiology.

[32]  William M. Wells,et al.  Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , 2004, IEEE Transactions on Medical Imaging.

[33]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[34]  Marc Modat,et al.  LoAd: A locally adaptive cortical segmentation algorithm , 2011, NeuroImage.

[35]  Nasser Kehtarnavaz,et al.  Comparison of tissue segmentation algorithms in neuroimage analysis software tools , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.