Data-Driven Rank Aggregation with Application to Grand Challenges

The increased number of challenges for comparative evaluation of biomedical image analysis procedures clearly reflects a need for unbiased assessment of the state-of-the-art methodological advances. Moreover, the ultimate translation of novel image analysis procedures to the clinic requires rigorous validation and evaluation of alternative schemes, a task that is best outsourced to the international research community. We commonly see an increase of the number of metrics to be used in parallel, reflecting alternative ways to measure similarity. Since different measures come with different scales and distributions, these are often normalized or converted into an individual rank ordering, leaving the problem of combining the set of multiple rankings into a final score. Proposed solutions are averaging or accumulation of rankings, raising the question if different metrics are to be treated the same or if all metrics would be needed to assess closeness to truth. We address this issue with a data-driven method for automatic estimation of weights for a set of metrics based on unsupervised rank aggregation. Our method requires no normalization procedures and makes no assumptions about metric distributions. We explore the sensitivity of metrics to small changes in input data with an iterative perturbation scheme, to prioritize the contribution of the most robust metrics in the overall ranking. We show on real anatomical data that our weighting scheme can dramatically change the ranking.

[1]  Guido Gerig,et al.  Measures for validation of DTI tractography , 2012, Medical Imaging.

[2]  William M. Wells,et al.  Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , 2004, IEEE Transactions on Medical Imaging.

[3]  Clement Vachet,et al.  Automatic corpus callosum segmentation using a deformable active Fourier contour model , 2012, Medical Imaging.

[4]  Arno Klein,et al.  Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration , 2009, NeuroImage.

[5]  Ali R. Khan,et al.  The DTI Challenge: Toward Standardized Evaluation of Diffusion Tensor Imaging Tractography for Neurosurgery , 2015, Journal of neuroimaging : official journal of the American Society of Neuroimaging.

[6]  Allan Hanbury,et al.  Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool , 2015, BMC Medical Imaging.

[7]  Guido Gerig,et al.  User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability , 2006, NeuroImage.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Allan Hanbury,et al.  A formal method for selecting evaluation metrics for image segmentation , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[10]  Dan Roth,et al.  An Unsupervised Learning Algorithm for Rank Aggregation , 2007, ECML.

[11]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.