Improving Medical Image Classification with Label Noise Using Dual-uncertainty Estimation

Deep neural networks are known to be data-driven and label noise can have a marked impact on model performance. Recent studies have shown great robustness to classic image recognition even under a high noisy rate. In medical applications, learning from datasets with label noise is more challenging since medical imaging datasets tend to have instance-dependent noise (IDN) and suffer from high observer variability. In this paper, we systematically discuss the two common types of label noise in medical images - disagreement label noise from inconsistency expert opinions and single-target label noise from biased aggregation of individual annotations. We then propose an uncertainty estimation-based framework to handle these two label noise amid the medical image classification task. We design a dual-uncertainty estimation approach to measure the disagreement label noise and single-target label noise via improved Direct Uncertainty Prediction and Monte- Carlo-Dropout. A boosting-based curriculum training procedure is later introduced for robust learning. We demonstrate the effectiveness of our method by conducting extensive experiments on three different diseases with synthesized and real-world label noise: skin lesions, prostate cancer, and retinal diseases. We also release a large re-engineered database that consists of annotations from more than ten ophthalmologists with an unbiased golden standard dataset for evaluation and benchmarking. The dataset is available at https://mmai.group/peoples/julie/.

[1]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[2]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[3]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[4]  Rongrong Ji,et al.  Asymmetric Co-Teaching for Unsupervised Cross Domain Person Re-Identification , 2019, AAAI.

[5]  Xiaoying Tang,et al.  A Survey on Deep Learning of Small Sample in Biomedical Image Analysis , 2019, ArXiv.

[6]  Hao Chen,et al.  Robust Learning at Noisy Labeled Medical Images: Applied to Skin Lesion Classification , 2019, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).

[7]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Simon K. Warfield,et al.  Deep learning with noisy labels: exploring techniques and remedies in medical image analysis , 2020, Medical Image Anal..

[9]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[10]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[11]  Binqiang Zhao,et al.  O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Yueming Lyu,et al.  Curriculum Loss: Robust Learning and Generalization against Label Corruption , 2019, ICLR.

[13]  Emily M. Hand,et al.  Automated Label Noise Identification for Facial Attribute Recognition , 2019, CVPR Workshops.

[14]  Lilly Irani,et al.  Amazon Mechanical Turk , 2018, Advances in Intelligent Systems and Computing.

[15]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[16]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[17]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[18]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[19]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[20]  Jon M. Kleinberg,et al.  Direct Uncertainty Prediction for Medical Second Opinions , 2018, ICML.

[21]  Dorin Comaniciu,et al.  Quantifying and Leveraging Classification Uncertainty for Chest Radiograph Assessment , 2019, MICCAI.

[22]  Swami Sankaranarayanan,et al.  Learning From Noisy Labels by Regularized Estimation of Annotator Confusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[24]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[25]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[26]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Kun Zhang,et al.  Transfer Learning with Label Noise , 2017, 1707.09724.

[28]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[29]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[30]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[31]  Gang Niu,et al.  Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.

[32]  Philipp Berens,et al.  Expert-validated estimation of diagnostic uncertainty for deep neural networks in diabetic retinopathy detection , 2019, Medical Image Anal..

[33]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[34]  T. Hermanns,et al.  Automated Gleason grading of prostate cancer tissue microarrays via deep learning , 2018, Scientific Reports.

[35]  V. Sudha,et al.  Diabetic Retinopathy Detection , 2020, International Journal of Engineering and Advanced Technology.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Weilin Huang,et al.  CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images , 2018, ECCV.

[38]  Michael I. Jordan,et al.  Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[39]  Xin Wang,et al.  Retinal Abnormalities Recognition Using Regional Multitask Learning , 2019, MICCAI.

[40]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[41]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[42]  Ha Q. Nguyen,et al.  Interpreting chest X-rays via CNNs that exploit disease dependencies and uncertainty labels , 2019, ArXiv.

[43]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[44]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[45]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[46]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[47]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[48]  Elyor Kodirov,et al.  IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters , 2019 .

[49]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[51]  Hayit Greenspan,et al.  Training a neural network based on unreliable human annotation of medical images , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[52]  Jae-Gil Lee,et al.  Learning from Noisy Labels with Deep Neural Networks: A Survey , 2020, ArXiv.