Data Shapley Value for Handling Noisy Labels: An Application in Screening Covid-19 Pneumonia from Chest CT Scans

A long-standing challenge of deep learning models involves how to handle noisy labels, especially in applications where human lives are at stake. Adoption of the data Shapley Value (SV), a cooperative game theoretical approach, is an intelligent valuation solution to tackle the issue of noisy labels. Data SV can be used together with a learning model and an evaluation metric to validate each training point's contribution to the model's performance. The SV of a data point, however, is not unique and depends on the learning model, the evaluation metric, and other data points collaborating in the training game. However, effects of utilizing different evaluation metrics for computation of the SV, detecting the noisy labels, and measuring the data points' importance has not yet been thoroughly investigated. In this context, we performed a series of comparative analyses to assess SV's capabilities to detect noisy input labels when measured by different evaluation metrics. Our experiments on COVID-19-infected of CT images illustrate that although the data SV can effectively identify noisy labels, adoption of different evaluation metric can significantly influence its ability to identify noisy labels from different data classes. Specifically, we demonstrate that the SV greatly depends on the associated evaluation metric.

[1]  Yuan Gao,et al.  Weakly Supervised Deep Learning for COVID-19 Infection Detection and Classification From CT Images , 2020, IEEE Access.

[2]  Moncef Gabbouj,et al.  COVID-19 infection map generation and detection from chest X-ray images , 2020, Health Information Science and Systems.

[3]  Zhiqiang He,et al.  Towards Efficient COVID-19 CT Annotation: A Benchmark for Lung and Infection Segmentation , 2020, ArXiv.

[4]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[5]  W. Liang,et al.  Clinically Applicable AI System for Accurate Diagnosis, Quantitative Measurements, and Prognosis of COVID-19 Pneumonia Using Computed Tomography , 2020, Cell.

[6]  Barry L. Nelson,et al.  Shapley Effects for Global Sensitivity Analysis: Theory and Computation , 2016, SIAM/ASA J. Uncertain. Quantification.

[7]  Nastaran Enshaei,et al.  COVID-CT-MD: COVID-19 Computed Tomography (CT) Scan Dataset Applicable in Machine Learning and Deep Learning , 2020, ArXiv.

[8]  Yingxu Wang,et al.  Diagnosis/Prognosis of COVID-19 Chest Images via Machine Learning and Hypersignal Processing: Challenges, opportunities, and applications , 2021, IEEE Signal Processing Magazine.

[9]  Ankur Taly,et al.  The Explanation Game: Explaining Machine Learning Models Using Shapley Values , 2020, CD-MAKE.

[10]  Natalia Khuri,et al.  A value-based approach for training of classifiers with high-throughput small molecule screening data , 2021, BCB.

[11]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[12]  Zhiyong Xu,et al.  A Noise-Robust Framework for Automatic Segmentation of COVID-19 Pneumonia Lesions From CT Images , 2020, IEEE Transactions on Medical Imaging.

[13]  Luke Oakden-Rayner,et al.  Exploring large scale public medical image datasets , 2019, Academic radiology.

[14]  James Y. Zou,et al.  Data Shapley: Equitable Valuation of Data for Machine Learning , 2019, ICML.

[15]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[16]  Nastaran Enshaei,et al.  COVID-FACT: A Fully-Automated Capsule Network-Based Framework for Identification of COVID-19 Cases from Chest CT Scans , 2020, Frontiers in Artificial Intelligence.

[17]  Daniel L. Rubin,et al.  Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset , 2020, Scientific Reports.

[18]  William Parker,et al.  A Weakly Supervised Consistency-based Learning Method for COVID-19 Segmentation in CT Images , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Talal Rahwan,et al.  Bounding the Estimation Error of Sampling-based Shapley Value Approximation With/Without Stratifying , 2013, ArXiv.

[20]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[21]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[22]  L. Shapley A Value for n-person Games , 1988 .

[23]  Wenyu Liu,et al.  A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT , 2020, IEEE Transactions on Medical Imaging.

[24]  Hao Chen,et al.  Robust Learning at Noisy Labeled Medical Images: Applied to Skin Lesion Classification , 2019, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).

[25]  Costas J. Spanos,et al.  Towards Efficient Data Valuation Based on the Shapley Value , 2019, AISTATS.

[26]  Dawn Song,et al.  A Principled Approach to Data Valuation for Federated Learning , 2020, Federated Learning.