Foibles, follies, and fusion: Web-based collaboration for medical image labeling

Labels that identify specific anatomical and functional structures within medical images are essential to the characterization of the relationship between structure and function in many scientific and clinical studies. Automated methods that allow for high throughput have not yet been developed for all anatomical targets or validated for exceptional anatomies, and manual labeling remains the gold standard in many cases. However, manual placement of labels within a large image volume such as that obtained using magnetic resonance imaging (MRI) is exceptionally challenging, resource intensive, and fraught with intra- and inter-rater variability. The use of statistical methods to combine labels produced by multiple raters has grown significantly in popularity, in part, because it is thought that by estimating and accounting for rater reliability estimates of the true labels will be more accurate. This paper demonstrates the performance of a class of these statistical label combination methodologies using real-world data contributed by minimally trained human raters. The consistency of the statistical estimates, the accuracy compared to the individual observations, and the variability of both the estimates and the individual observations with respect to the number of labels are presented. It is demonstrated that statistical fusion successfully combines label information using data from online (Internet-based) collaborations among minimally trained raters. This first successful demonstration of a statistically based approach using minimally trained raters opens numerous possibilities for very large scale efforts in collaboration. Extension and generalization of these technologies for new applications will certainly present fascinating areas for continuing research.

[1]  Arno Klein,et al.  Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration , 2009, NeuroImage.

[2]  William M. Wells,et al.  Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , 2004, IEEE Transactions on Medical Imaging.

[3]  Daniel Rueckert,et al.  Improving intersubject image registration using tissue-class information benefits robustness and accuracy of multi-atlas based anatomical segmentation , 2010, NeuroImage.

[4]  Arno Klein,et al.  Mindboggle: a scatterbrained approach to automate brain labeling , 2005, NeuroImage.

[5]  Torsten Rohlfing,et al.  Performance-based classifier combination in atlas-based image segmentation using expectation-maximization parameter estimation , 2004, IEEE Transactions on Medical Imaging.

[6]  C. Meltzer,et al.  Brain tumor volume measurement: comparison of manual and semiautomated methods. , 1999, Radiology.

[7]  Nasser Kehtarnavaz,et al.  Comparison of tissue segmentation algorithms in neuroimage analysis software tools , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[9]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[10]  Simon K. Warfield,et al.  A Continuous STAPLE for Scalar, Vector, and Tensor Images: An Application to DTI Analysis , 2009, IEEE Transactions on Medical Imaging.

[11]  Jerry L. Prince,et al.  Simultaneous truth and performance level estimation with incomplete, over-complete, and ancillary data , 2010, Medical Imaging.

[12]  J A Fiez,et al.  Lesion segmentation and manual warping to a reference brain: Intra‐ and interobserver reliability , 2000, Human brain mapping.

[13]  Arno Klein,et al.  A reproducible evaluation of ANTs similarity metric performance in brain image registration , 2011, NeuroImage.

[14]  Daniel S. O'Leary,et al.  Human Frontal Cortex: An MRI-Based Parcellation Method , 1999, NeuroImage.

[15]  Jerry L. Prince,et al.  Statistical fusion of surface labels provided by multiple raters , 2010, Medical Imaging.

[16]  M. Horsfield,et al.  Intra- and inter-observer agreement of brain MRI lesion volume measurements in multiple sclerosis. A comparison of techniques. , 1995, Brain : a journal of neurology.

[17]  Saara Totterman,et al.  Accuracy and reproducibility of manual and semiautomated quantification of MS lesions by MRI , 2003, Journal of magnetic resonance imaging : JMRI.

[18]  Bennett A. Landman,et al.  Characterizing Spatially Varying Performance to Improve Multi-atlas Multi-label Segmentation , 2011, IPMI.

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  Bennett A. Landman,et al.  Robust Statistical Label Fusion Through Consensus Level, Labeler Accuracy, and Truth Estimation (COLLATE) , 2011, IEEE Transactions on Medical Imaging.

[21]  O. Commowick,et al.  Incorporating Priors on Expert Performance Parameters for Segmentation Validation and Label Fusion: A Maximum a Posteriori STAPLE , 2010, MICCAI.