On evaluating brain tissue classifiers without a ground truth

In this paper, we present a set of techniques for the evaluation of brain tissue classifiers on a large data set of MR images of the head. Due to the difficulty of establishing a gold standard for this type of data, we focus our attention on methods which do not require a ground truth, but instead rely on a common agreement principle. Three different techniques are presented: the Williams' index, a measure of common agreement; STAPLE, an Expectation Maximization algorithm which simultaneously estimates performance parameters and constructs an estimated reference standard; and Multidimensional Scaling, a visualization technique to explore similarity data. We apply these different evaluation methodologies to a set of eleven different segmentation algorithms on forty MR images. We then validate our evaluation pipeline by building a ground truth based on human expert tracings. The evaluations with and without a ground truth are compared. Our findings show that comparing classifiers without a gold standard can provide a lot of interesting information. In particular, outliers can be easily detected, strongly consistent or highly variable techniques can be readily discriminated, and the overall similarity between different techniques can be assessed. On the other hand, we also find that some information present in the expert segmentations is not captured by the automatic classifiers, suggesting that common agreement alone may not be sufficient for a precise performance evaluation of brain tissue classifiers.

[1]  D. Vince,et al.  Evaluation of three-dimensional segmentation algorithms for the identification of luminal and medial-adventitial borders in intravascular ultrasound images , 2000, IEEE Transactions on Medical Imaging.

[2]  William M. Wells,et al.  Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , 2004, IEEE Transactions on Medical Imaging.

[3]  W. Eric L. Grimson,et al.  Adaptive Segmentation of MRI Data , 1995, CVRMed.

[4]  W. Eric L. Grimson,et al.  Anatomical guided segmentation with non-stationary tissue class distributions in an expectation-maximization framework , 2004, 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro (IEEE Cat No. 04EX821).

[5]  William M. Wells,et al.  Validation of Image Segmentation and Expert Quality with an Expectation-Maximization Algorithm , 2002, MICCAI.

[6]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[7]  Paul A. Viola,et al.  Multi-modal volume registration by maximization of mutual information , 1996, Medical Image Anal..

[8]  Torsten Rohlfing,et al.  Expectation Maximization Strategies for Multi-atlas Multi-label Segmentation , 2003, IPMI.

[9]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[10]  Guido Gerig,et al.  Valmet: A New Validation Tool for Assessing and Improving 3D Object Segmentation , 2001, MICCAI.

[11]  Arthur W. Toga,et al.  A meta-algorithm for brain extraction in MRI , 2004, NeuroImage.

[12]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[13]  Stephen M. Smith,et al.  Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm , 2001, IEEE Transactions on Medical Imaging.

[14]  Stephen M Smith,et al.  Fast robust automated brain extraction , 2002, Human brain mapping.

[15]  Eric L. Schwartz,et al.  A Numerical Solution to the Generalized Mapmaker's Problem: Flattening Nonconvex Polyhedral Surfaces , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Benoit M. Dawant,et al.  Morphometric analysis of white matter lesions in MR images: method and validation , 1994, IEEE Trans. Medical Imaging.

[17]  Karl J. Friston,et al.  Human Brain Function , 1997 .

[18]  Torsten Rohlfing,et al.  Extraction and Application of Expert Priors to Combine Multiple Segmentations of Human Brain Tissue , 2003, MICCAI.

[19]  Trevor F. Cox,et al.  Multidimensional Scaling, Second Edition , 2000 .

[20]  Alan C. Evans,et al.  Automatic "pipeline" analysis of 3-D MRI data for clinical trials: application to multiple sclerosis , 2002, IEEE Transactions on Medical Imaging.

[21]  D. Louis Collins,et al.  Design and construction of a realistic digital brain phantom , 1998, IEEE Transactions on Medical Imaging.

[22]  Yongmin Kim,et al.  A methodology for evaluation of boundary detection algorithms on medical images , 1997, IEEE Transactions on Medical Imaging.

[23]  Karl J. Friston,et al.  Spatial Normalization using Basis Functions , 2003 .

[24]  George Hripcsak,et al.  Measuring agreement in medical informatics reliability studies , 2002, J. Biomed. Informatics.

[25]  G. W. Williams,et al.  Comparing the joint agreement of several raters with another rater. , 1976, Biometrics.

[26]  Ron Kikinis,et al.  Improved watershed transform for medical image segmentation using prior information , 2004, IEEE Transactions on Medical Imaging.

[27]  Simon K. Warfield,et al.  Fast k-NN classification for multichannel image data , 1996, Pattern Recognit. Lett..

[28]  William M. Wells,et al.  Simultaneous validation of image segmentation and assessment of expert quality [tumor MRI application] , 2002, Proceedings IEEE International Symposium on Biomedical Imaging.

[29]  Alan C. Evans,et al.  A nonparametric method for automatic correction of intensity nonuniformity in MRI data , 1998, IEEE Transactions on Medical Imaging.

[30]  Martha Elizabeth Shenton,et al.  Two Methods for Validating Brain Tissue Classifiers , 2005, MICCAI.

[31]  K. Castleman,et al.  Flux-based anisotropic diffusion applied to enhancement of 3-D angiogram , 2002 .

[32]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[33]  Michael J. Ackerman,et al.  Toward a Common Validation Methodology for Segmentation and Registration Algorithms , 2000, MICCAI.

[34]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .