Inaccurate Labels in Weakly-Supervised Deep Learning: Automatic Identification and Correction and Their Impact on Classification Performance

In data-driven deep learning-based modeling, data quality may substantially influence classification performance. Correct data labeling for deep learning modeling is critical. In weakly-supervised learning, a challenge lies in dealing with potentially inaccurate or mislabeled training data. In this paper, we proposed an automated methodological framework to identify mislabeled data using two metric functions, namely, Cross-entropy Loss that indicates divergence between a prediction and ground truth, and Influence function that reflects the dependence of a model on data. After correcting the identified mislabels, we measured their impact on the classification performance. We also compared the mislabeling effects in three experiments on two different real-world clinical questions. A total of 10,500 images were studied in the contexts of clinical breast density category classification and breast cancer malignancy diagnosis. We used intentionally flipped labels as mislabels to evaluate the proposed method at a varying proportion of mislabeled data included in model training. We also compared the effects of our method to two published schemes for breast density category classification. Experiment results show that when the dataset contains 10% of mislabeled data, our method can automatically identify up to 98% of these mislabeled data by examining/checking the top 30% of the full dataset. Furthermore, we show that correcting the identified mislabels leads to an improvement in the classification performance. Our method provides a feasible solution for weakly-supervised deep learning modeling in dealing with inaccurate labels.

[1]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[2]  Ahmed Hosny,et al.  Artificial intelligence in radiology , 2018, Nature Reviews Cancer.

[3]  A. Ng,et al.  Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists , 2018, PLoS medicine.

[4]  Joel H. Saltz,et al.  Pancreatic Cancer Detection in Whole Slide Images Using Noisy Label Annotations , 2019, MICCAI.

[5]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[7]  Hayit Greenspan,et al.  Training a neural network based on unreliable human annotation of medical images , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[8]  R. Dennis Cook,et al.  Detection of Influential Observation in Linear Regression , 2000, Technometrics.

[9]  Richard H. Moore,et al.  Current Status of the Digital Database for Screening Mammography , 1998, Digital Mammography / IWDM.

[10]  Dimitrios I. Fotiadis,et al.  Medical data quality assessment: On the development of an automated framework for medical data curation , 2019, Comput. Biol. Medicine.

[11]  B. Keller,et al.  Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation. , 2012, Medical physics.

[12]  Ralph Roskies,et al.  Bridges: a uniquely flexible HPC resource for new communities and data analytics , 2015, XSEDE.

[13]  Aly A. Mohamed,et al.  Deep Learning to Distinguish Recalled but Benign Mammography Images in Breast Cancer Screening , 2018, Clinical Cancer Research.

[14]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[15]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[18]  Lin Shi,et al.  Histogram-based normalization technique on human brain magnetic resonance images from different acquisitions , 2015, Biomedical engineering online.

[19]  Swami Sankaranarayanan,et al.  Learning From Noisy Labels by Regularized Estimation of Annotator Confusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[21]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[24]  Steven Euijong Whang,et al.  A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective , 2018, IEEE Transactions on Knowledge and Data Engineering.

[25]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[26]  Richard H. Moore,et al.  THE DIGITAL DATABASE FOR SCREENING MAMMOGRAPHY , 2007 .

[27]  X. Castells,et al.  Inter- and intraradiologist variability in the BI-RADS assessment and breast density categories for screening mammograms. , 2012, The British journal of radiology.

[28]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Tinghuai Ma,et al.  Novel mislabeled training data detection algorithm , 2016, Neural Computing and Applications.

[31]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  B. Keller,et al.  Preliminary evaluation of the publicly available Laboratory for Breast Radiodensity Assessment (LIBRA) software tool: comparison of fully automated area and volumetric density measures in a case–control study with digital mammography , 2015, Breast Cancer Research.

[34]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[35]  N. Keating,et al.  New Federal Requirements to Inform Patients About Breast Density: Will They Help Patients? , 2019, JAMA.

[36]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[37]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[38]  Dag Pavic,et al.  Effects of Changes in BI-RADS Density Assessment Guidelines (Fourth Versus Fifth Edition) on Breast Density Assessment: Intra- and Interreader Agreements and Density Distribution. , 2016, AJR. American journal of roentgenology.

[39]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[40]  Yahong Luo,et al.  A deep learning method for classifying mammographic breast density categories , 2018, Medical physics.