A Two-Stream Mutual Attention Network for Semi-Supervised Biomedical Segmentation with Noisy Labels

Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TSMAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during parameter updating process. By exchanging multi-level features within two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the noisy gradients. Furthermore, a hierarchical distillation is developed to provide reliable pseudo labels for unlabelded data, which further boosts the performance of TSMAN. The experiments using both HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results.

[1]  Hao Chen,et al.  Volumetric ConvNets with Mixed Residual Connections for Automated Prostate Segmentation from 3D MR Images , 2017, AAAI.

[2]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[3]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[4]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[5]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[6]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[7]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[8]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[9]  Kaiming He,et al.  Data Distillation: Towards Omni-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Marco Aiello,et al.  AAAI Conference on Artificial Intelligence , 2011, AAAI Conference on Artificial Intelligence.

[11]  Ben Glocker,et al.  Semi-supervised Learning for Network-Based Cardiac MR Image Segmentation , 2017, MICCAI.

[12]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[14]  Eduardo Gasca,et al.  Decontamination of Training Samples for Supervised Pattern Recognition Methods , 2000, SSPR/SPR.

[15]  Min Tang,et al.  A Deep Level Set Method for Image Segmentation , 2017, DLMIA/ML-CDS@MICCAI.

[16]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[17]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[19]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[21]  Xingrui Yu,et al.  Co-sampling: Training Robust Networks for Extremely Noisy Supervision , 2018, ArXiv.

[22]  Sheng Tang,et al.  Scale-Adaptive Convolutions for Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Lei Yue,et al.  Attentional Alignment Networks , 2018, BMVC.

[24]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Kuan-Lun Tseng,et al.  Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Kalyan Moy Gupta,et al.  Cautious Inference in Collective Classification , 2007, AAAI.

[28]  Michael Kistler,et al.  The Virtual Skeleton Database: An Open Access Repository for Biomedical Research and Collaboration , 2013, Journal of medical Internet research.

[29]  Dong Liu,et al.  Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction , 2018, ACM Multimedia.

[30]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[31]  Polina Golland,et al.  Interactive Whole-Heart Segmentation in Congenital Heart Disease , 2015, MICCAI.

[32]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33]  Hao Chen,et al.  Automatic 3D Cardiovascular MR Segmentation with Densely-Connected Volumetric ConvNets , 2017, MICCAI.