Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to Data Imbalance in Deep Learning Based Segmentation

The subject of ‘fairness’ in artificial intelligence (AI) refers to assessing AI algorithms for potential bias based on demographic characteristics such as race and gender, and the development of algorithms to address this bias. Most applications to date have been in computer vision, although some work in healthcare has started to emerge. The use of deep learning (DL) in cardiac MR segmentation has led to impressive results in recent years, and such techniques are starting to be translated into clinical practice. However, no work has yet investigated the fairness of such models. In this work, we perform such an analysis for racial/gender groups, focusing on the problem of training data imbalance, using a nnU-Net model trained and evaluated on cine short axis cardiac MR data from the UK Biobank dataset, consisting of 5,903 subjects from 6 different racial groups. We find statistically significant differences in Dice performance between different racial groups. To reduce the racial bias, we investigated three strategies: (1) stratified batch sampling, in which batch sampling is stratified to ensure balance between racial groups; (2) fair meta-learning for segmentation, in which a DL classifier is trained to classify race and jointly optimized with the segmentation model; and (3) protected group models, in which a different segmentation model is trained for each racial group. We also compared the results to the scenario where we have a perfectly balanced database. To assess fairness we used the standard deviation (SD) and skewed error ratio (SER) of the average Dice values. Our results demonstrate that the racial bias results from the use of imbalanced training data, and that all proposed bias mitigation strategies improved fairness, with the best SD and SER resulting from the use of protected group models.

[1]  Eric Rice,et al.  Clinical trial of an AI-augmented intervention for HIV prevention in youth experiencing homelessness , 2020, AAAI.

[2]  Blake Lemoine,et al.  Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[3]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[4]  P. Mody,et al.  Most important articles on cardiovascular disease among racial and ethnic minorities. , 2012, Circulation. Cardiovascular quality and outcomes.

[5]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jens Petersen,et al.  nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation , 2020, Nature Methods.

[7]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[8]  Xiangliang Zhang,et al.  Decision Theory for Discrimination-Aware Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[9]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[10]  Sinan Kalkan,et al.  Investigating Bias and Fairness in Facial Expression Recognition , 2020, ECCV Workshops.

[11]  Sungho Park,et al.  FairFaceGAN: Fairness-aware Facial Image-to-Image Translation , 2020, BMVC.

[12]  Xin Yang,et al.  Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? , 2018, IEEE Transactions on Medical Imaging.

[13]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[14]  Wenjia Bai,et al.  Fully Automated, Quality-Controlled Cardiac Analysis From CMR: Validation and Large-Scale Application to Characterize Cardiac Function , 2019, JACC. Cardiovascular imaging.

[15]  C. Lewis,et al.  Race–Ethnic and Sex Differences in Left Ventricular Structure and Function: The Coronary Artery Risk Development in Young Adults (CARDIA) Study , 2015, Journal of the American Heart Association.

[16]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[17]  Michael Burke,et al.  Bias Remediation in Driver Drowsiness Detection Systems Using Generative Adversarial Networks , 2019, IEEE Access.

[18]  D. Bluemke,et al.  Cardiovascular magnetic resonance in an adult human population: serial observations from the multi-ethnic study of atherosclerosis , 2017, Journal of Cardiovascular Magnetic Resonance.

[19]  P. Matthews,et al.  UK Biobank’s cardiovascular magnetic resonance protocol , 2015, Journal of Cardiovascular Magnetic Resonance.

[20]  Olga Russakovsky,et al.  Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Antitza Dantcheva,et al.  Mitigating Bias in Gender, Age and Ethnicity Classification: A Multi-task Convolution Neural Network Approach , 2018, ECCV Workshops.

[22]  Weihong Deng,et al.  Mitigating Bias in Face Recognition Using Skewness-Aware Reinforcement Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[24]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.