论文信息 - CheXclusion: Fairness gaps in deep chest X-ray classifiers

CheXclusion: Fairness gaps in deep chest X-ray classifiers

Machine learning systems have received much attention recently for their ability to achieve expert-level performance on clinical tasks, particularly in medical imaging. Here, we examine the extent to which state-of-the-art deep learning classifiers trained to yield diagnostic labels from X-ray images are biased with respect to protected attributes. We train convolution neural networks to predict 14 diagnostic labels in 3 prominent public chest X-ray datasets: MIMIC-CXR, Chest-Xray8, CheXpert, as well as a multi-site aggregation of all those datasets. We evaluate the TPR disparity -- the difference in true positive rates (TPR) -- among different protected attributes such as patient sex, age, race, and insurance type as a proxy for socioeconomic status. We demonstrate that TPR disparities exist in the state-of-the-art classifiers in all datasets, for all clinical tasks, and all subgroups. A multi-source dataset corresponds to the smallest disparities, suggesting one way to reduce bias. We find that TPR disparities are not significantly correlated with a subgroup's proportional disease burden. As clinical models move from papers to products, we encourage clinical decision makers to carefully audit for algorithmic disparities prior to deployment. Our code can be found at, this https URL

[1] N. Shah,et al. Implementing Machine Learning in Health Care - Addressing Ethical Challenges. , 2018, The New England journal of medicine.

[2] Yifan Yu,et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[3] M. Howell,et al. Ensuring Fairness in Machine Learning to Advance Health Equity , 2018, Annals of Internal Medicine.

[4] Marcus A. Badgeley,et al. Confounding variables can degrade generalization performance of radiological deep learning models , 2018, ArXiv.

[5] Timnit Gebru,et al. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[6] Thorsten Joachims,et al. Policy Learning for Fairness in Ranking , 2019, NeurIPS.

[7] Stefan Bauer,et al. On the Fairness of Disentangled Representations , 2019, NeurIPS.

[8] D. Hoffmann,et al. The girl who cried pain: a bias against women in the treatment of pain. , 2001, The Journal of law, medicine & ethics : a journal of the American Society of Law, Medicine & Ethics.

[9] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Shahrokh Valaee,et al. Generalization of Deep Neural Networks for Chest Pathology Classification in X-Rays Using Generative Adversarial Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Eric J Topol,et al. High-performance medicine: the convergence of human and artificial intelligence , 2019, Nature Medicine.

[12] Matt J. Kusner,et al. Counterfactual Fairness , 2017, NIPS.

[13] Toniann Pitassi,et al. Learning Fair Representations , 2013, ICML.

[14] Blake Lemoine,et al. Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[15] Sebastian Thrun,et al. Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[16] Richard Duszak,et al. A County-Level Analysis of the US Radiologist Workforce: Physician Supply and Subspecialty Characteristics. , 2018, Journal of the American College of Radiology : JACR.

[17] Thorsten Dickhaus,et al. Simultaneous Statistical Inference , 2014, Springer Berlin Heidelberg.

[18] Rolf Holle,et al. “Age matters”—German claims data indicate disparities in lung cancer care between elderly and young patients , 2019, PloS one.

[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20] Percy Liang,et al. Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[21] S. Kennedy,et al. Diagnostic Radiology in Liberia: A Country Report , 2015 .