Rethinking Generalization: The Impact of Annotation Style on Medical Image Segmentation

Generalization is an important attribute of machine learning models, particularly for those that are to be deployed in a medical context, where unreliable predictions can have real world consequences. While the failure of models to generalize across datasets is typically attributed to a mismatch in the data distributions, performance gaps are often a consequence of biases in the “ground-truth” label annotations. This is particularly important in the context of medical image segmentation of pathological structures (e.g. lesions), where the annotation process is much more subjective, and affected by a number underlying factors, including the annotation protocol, rater education/experience, and clinical aims, among others. In this paper, we show that modeling annotation biases, rather than ignoring them, poses a promising way of accounting for differences in annotation style across datasets. To this end, we propose a generalized conditioning framework to (1) learn and account for different annotation styles across multiple datasets using a single model, (2) identify similar annotation styles across different datasets in order to permit their effective aggregation, and (3) fine-tune a fully trained model to a new annotation style with just a few samples. Next, we present an image-conditioning approach to model annotation styles that correlate with specific image features, potentially enabling detection biases to be more easily identified.

[1]  Multi-CartoonGAN with Conditional Adaptive Instance-Layer Normalization for Conditional Artistic Face Translation , 2022, AI.

[2]  Christos Davatzikos,et al.  Embracing the disharmony in medical imaging: A Simple and effective framework for domain adaptation , 2021, Medical Image Anal..

[3]  Yong Xia,et al.  Modeling annotator preference and stochastic annotation error for medical image segmentation , 2021, Medical Image Anal..

[4]  Anh Tuan Tran,et al.  Exploiting Domain-Specific Features to Enhance Domain Generalization , 2021, NeurIPS.

[5]  R. Sayres,et al.  Iterative Quality Control Strategies for Expert Medical Image Labeling , 2021, HCOMP.

[6]  M. Jenkinson,et al.  Opportunities for Understanding MS Mechanisms and Progression With MRI Using Large-Scale Data Sharing and Artificial Intelligence , 2021, Neurology.

[7]  Q. M. Wu,et al.  D-BIN: A Generalized Disentangling Batch Instance Normalization for Domain Adaptation , 2021, IEEE Transactions on Cybernetics.

[8]  D. Collins,et al.  Diffusely abnormal white matter converts to T2 lesion volume in the absence of acute inflammation , 2021, bioRxiv.

[9]  T. Arbel,et al.  Cohort Bias Adaptation in Aggregated Datasets for Lesion Segmentation , 2021, DART/FAIR@MICCAI.

[10]  Qi Bi,et al.  Learning Calibrated Medical Image Segmentation via Multi-rater Agreement Modeling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Julien Cohen-Adad,et al.  Impact of individual rater style on deep learning uncertainty in medical imaging segmentation , 2021, ArXiv.

[12]  Hailin Jin,et al.  ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Sotirios A. Tsaftaris,et al.  INSIDE: Steering Spatial Attention with Non-Imaging Information in CNNs , 2020, MICCAI.

[14]  O. Ciccarelli,et al.  Disentangling Human Error from the Ground Truth in Segmentation of Medical Images , 2020, NeurIPS 2020.

[15]  X. Montalban,et al.  Treatment Optimization in Multiple Sclerosis: Canadian MS Working Group Recommendations , 2020, Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques.

[16]  Yoonsik Kim,et al.  Transfer Learning From Synthetic to Real-Noise Denoising With Adaptive Instance Normalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Joseph Paul Cohen,et al.  On the limits of cross-domain generalization in automated X-ray prediction , 2020, MIDL.

[18]  S. Warfield,et al.  Deep learning with noisy labels: exploring techniques and remedies in medical image analysis , 2019, Medical Image Anal..

[19]  Jakub M. Tomczak,et al.  DIVA: Domain Invariant Variational Autoencoders , 2019, DGS@ICLR.

[20]  Bernhard Kainz,et al.  Exploring the Relationship Between Segmentation Uncertainty, Segmentation Performance and Inter-observer Variability with Probabilistic Networks , 2019, LABELS/HAL-MICCAI/CuRIOUS@MICCAI.

[21]  Luke Oakden-Rayner,et al.  Exploring large scale public medical image datasets , 2019, Academic radiology.

[22]  Mitko Veta,et al.  Learning Domain-Invariant Representations of Histological Images , 2019, Front. Med..

[23]  Yan Shen,et al.  Brain Tumor Segmentation on MRI with Missing Modalities , 2019, IPMI.

[24]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[25]  Klaus H. Maier-Hein,et al.  No New-Net , 2018, 1809.10483.

[26]  L. Joskowicz,et al.  Inter-observer variability of manual contour delineation of structures in CT , 2018, European Radiology.

[27]  Doina Precup,et al.  Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation , 2018, MICCAI.

[28]  Martin Styner,et al.  Objective Evaluation of Multiple Sclerosis Lesion Segmentation using a Data Management and Processing Infrastructure , 2018, bioRxiv.

[29]  Nikolaos Papanikolopoulos,et al.  Imperfect Segmentation Labels: How Much Do They Matter? , 2018, CVII-STENT/LABELS@MICCAI.

[30]  Mauricio Reyes,et al.  On the Effect of Inter-observer Variability for a Reliable Estimation of Uncertainty of Medical Image Segmentation , 2018, MICCAI.

[31]  Ender Konukoglu,et al.  A Lifelong Learning Approach to Brain MR Segmentation Across Scanners and Protocols , 2018, MICCAI.

[32]  Hyo-Eun Kim,et al.  Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks , 2018, NeurIPS.

[33]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[34]  D. Reich,et al.  Volumetric Analysis from a Harmonized Multisite Brain MRI Study of a Single Subject with Multiple Sclerosis , 2017, American Journal of Neuroradiology.

[35]  Konstantinos Kamnitsas,et al.  Unsupervised domain adaptation in brain lesion segmentation with adversarial networks , 2016, IPMI.

[36]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[37]  Anisha Keshavan,et al.  Intra- and interscanner variability of magnetic resonance imaging based volumetry in multiple sclerosis , 2016, NeuroImage.

[38]  Mohammad Havaei,et al.  HeMIS: Hetero-Modal Image Segmentation , 2016, MICCAI.

[39]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[40]  Marleen de Bruijne,et al.  Transfer Learning Improves Supervised Image Segmentation Across Imaging Protocols , 2015, IEEE Trans. Medical Imaging.

[41]  F. Jacques Defining the clinical course of multiple sclerosis: The 2013 revisions , 2015, Neurology.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  J. Hilden,et al.  Observer bias in randomized clinical trials with measurement scale outcomes: a systematic review of trials with both blinded and nonblinded assessors , 2013, Canadian Medical Association Journal.

[44]  Elizabeth Fisher,et al.  Reliability of classifying multiple sclerosis disease activity using magnetic resonance imaging in a multiple sclerosis clinic. , 2013, JAMA neurology.

[45]  Jerry L. Prince,et al.  Foibles, follies, and fusion: Web-based collaboration for medical image labeling , 2012, NeuroImage.

[46]  B. Hurwitz The diagnosis of multiple sclerosis and the clinical subtypes , 2009, Annals of Indian Academy of Neurology.

[47]  Frederik Barkhof,et al.  Diffusely abnormal white matter in chronic multiple sclerosis: imaging and histopathologic analysis. , 2009, Archives of neurology.

[48]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[49]  William M. Wells,et al.  Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , 2004, IEEE Transactions on Medical Imaging.

[50]  Andrew Zisserman,et al.  Estimation of the partial volume effect in MRI , 2002, Medical Image Anal..