A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Machine learning models commonly exhibit unexpected failures post-deployment due to either data shifts or uncommon situations in the training environment. Domain experts typically go through the tedious process of inspecting the failure cases manually, identifying failure modes and then attempting to fix the model. In this work, we aim to standardise and bring principles to this process through answering two critical questions: (i) how do we know that we have identified meaningful and distinct failure types?; (ii) how can we validate that a model has, indeed, been repaired? We suggest that the quality of the identified failure types can be validated through measuring the intraand inter-type generalisation after fine-tuning and introduce metrics to compare different subtyping methods. Furthermore, we argue that a model can be considered repaired if it achieves high accuracy on the failure types while retaining performance on the previously correct data. We combine these two ideas into a principled framework for evaluating the quality of both the identified failure subtypes and model repairment. We evaluate its utility on a classification and an object detection tasks. Our code is available at https: //github.com/Rokken-lab6/Failure-Analysis-and-Model-Repairment

[1]  Bingbing Ni,et al.  MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis , 2020, ArXiv.

[2]  Ender Konukoglu,et al.  A Lifelong Learning Approach to Brain MR Segmentation Across Scanners and Protocols , 2018, MICCAI.

[3]  Lequan Yu,et al.  MS-Net: Multi-Site Network for Improving Prostate Segmentation With Heterogeneous MRI Data , 2020, IEEE Transactions on Medical Imaging.

[4]  Adrian V. Dalca,et al.  A Learning Strategy for Contrast-agnostic MRI Segmentation , 2020, MIDL.

[5]  Sahil Singla,et al.  Understanding Failures of Deep Networks via Robust Feature Extraction , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Constantino Carlos Reyes-Aldasoro,et al.  Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study , 2019, PLoS medicine.

[7]  Gary S Collins,et al.  Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension , 2020, BMJ.

[8]  Gary S. Collins,et al.  Reporting of artificial intelligence prediction models , 2019, The Lancet.

[9]  Nassir Navab,et al.  ROAM: Random Layer Mixup for Semi-Supervised Learning in Medical Imaging , 2020, ArXiv.

[10]  Stefan Klein,et al.  Improving Robustness of Deep Learning Based Knee MRI Segmentation: Mixup and Adversarial Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[11]  Gustavo Carneiro,et al.  Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.

[12]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Georg Langs,et al.  Dynamic memory to alleviate catastrophic forgetting in continuous learning settings , 2020, MICCAI.

[15]  Krishna Chaitanya,et al.  Test-Time Adaptable Neural Networks for Robust Medical Image Segmentation , 2020, Medical image analysis.

[16]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[17]  Daniel C. Castro,et al.  Domain Generalization via Model-Agnostic Learning of Semantic Features , 2019, NeurIPS.

[18]  Konstantinos Kamnitsas,et al.  Unsupervised domain adaptation in brain lesion segmentation with adversarial networks , 2016, IPMI.