Understanding and Mitigating Accuracy Disparity in Regression

With the widespread deployment of large-scale prediction systems in high-stakes domains, e.g., face recognition, criminal justice, etc., disparity in prediction accuracy between different demographic subgroups has called for fundamental understanding on the source of such disparity and algorithmic intervention to mitigate it. In this paper, we study the accuracy disparity problem in regression. To begin with, we first propose an error decomposition theorem, which decomposes the accuracy disparity into the distance between marginal label distributions and the distance between conditional representations, to help explain why such accuracy disparity appears in practice. Motivated by this error decomposition and the general idea of distribution alignment with statistical distances, we then propose an algorithm to reduce this disparity, and analyze its game-theoretic optima of the proposed objective functions. To corroborate our theoretical findings, we also conduct experiments on five benchmark datasets. The experimental results suggest that our proposed algorithms can effectively mitigate accuracy disparity while maintaining the predictive power of the regression models.

[1]  Harikrishna Narasimhan,et al.  Pairwise Fairness for Ranking and Regression , 2019, AAAI.

[2]  Noureddine El Karoui,et al.  Fairness-Aware Learning for Continuous Attributes and Treatments , 2019, ICML.

[3]  Han Zhao,et al.  Conditional Learning of Fair Representations , 2019, ICLR.

[4]  Pradeep Ravikumar,et al.  Fundamental Limits and Tradeoffs in Invariant Representation Learning , 2020, ArXiv.

[5]  Jean-Michel Loubes,et al.  Projection to Fairness in Statistical Learning. , 2020 .

[6]  Sherri Rose,et al.  Fair regression for health care spending , 2019, Biometrics.

[7]  Geoff Gordon,et al.  Inherent Tradeoffs in Learning Fair Representations , 2019, NeurIPS.

[8]  J'er'emie Bigot,et al.  Statistical data analysis in the Wasserstein space , 2019, ESAIM: Proceedings and Surveys.

[9]  Cheng Soon Ong,et al.  Costs and Benefits of Fair Representation Learning , 2019, AIES.

[10]  José A. Pino,et al.  Predicting Health Care Costs Using Evidence Regression , 2019, UCAmI.

[11]  Luca Oneto,et al.  Fair regression via plug-in estimator and recalibration with statistical guarantees , 2020, NeurIPS.

[12]  Luca Oneto,et al.  Fair Regression with Wasserstein Barycenters , 2020, NeurIPS.

[13]  Abhijit Ghatak,et al.  Machine Learning with R , 2017, Springer Singapore.

[14]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[15]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[16]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[17]  Andreas Ekelhart,et al.  Utility and Privacy Assessments of Synthetic Data for Regression Tasks , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[18]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[19]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[20]  Linda F. Wightman LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. , 1998 .

[21]  Zhe Zhao,et al.  Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations , 2017, ArXiv.

[22]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[23]  David Sontag,et al.  Why Is My Classifier Discriminatory? , 2018, NeurIPS.

[24]  Mikhail Belkin,et al.  Back to the Future: Radial Basis Function Network Revisited , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Kush R. Varshney,et al.  Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing , 2020, ICML.

[26]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[27]  Carmela Troncoso,et al.  Disparate Vulnerability: on the Unfairness of Privacy Attacks Against Machine Learning , 2019, ArXiv.

[28]  Geoffrey J. Gordon,et al.  Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation , 2019, NeurIPS.

[29]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[30]  Silvia Chiappa,et al.  Wasserstein Fair Classification , 2019, UAI.

[31]  Toniann Pitassi,et al.  Fairness through Causal Awareness: Learning Causal Latent-Variable Models for Biased Data , 2018, FAT.

[32]  Amos J. Storkey,et al.  Censoring Representations with an Adversary , 2015, ICLR.

[33]  Martha White,et al.  An implicit function learning approach for parametric modal regression , 2020, NeurIPS.

[34]  Akiko Takeda,et al.  Nonconvex Optimization for Regression with Fairness Constraints , 2018, ICML.

[35]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[36]  Toon Calders,et al.  Controlling Attribute Effect in Linear Regression , 2013, 2013 IEEE 13th International Conference on Data Mining.

[37]  P. Kim Data-Driven Discrimination at Work , 2017 .

[38]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[39]  COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[40]  Toniann Pitassi,et al.  Learning Adversarially Fair and Transferable Representations , 2018, ICML.

[41]  Dean P. Foster,et al.  Impartial Predictive Modeling: Ensuring Fairness in Arbitrary Models , 2016 .

[42]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[43]  Mikhail Belkin,et al.  Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks , 2020, ICLR.

[44]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[45]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[46]  Percy Liang,et al.  Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately , 2020, FAccT.

[47]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[48]  Miroslav Dudík,et al.  Fair Regression: Quantitative Definitions and Reduction-based Algorithms , 2019, ICML.

[49]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[50]  Percy Liang,et al.  Noise Induces Loss Discrepancy Across Groups for Linear Regression , 2019, ArXiv.

[51]  Mikhail Belkin,et al.  Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..

[52]  Hanna M. Wallach,et al.  Fairlearn: A toolkit for assessing and improving fairness in AI , 2020 .

[53]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[54]  C. Villani Optimal Transport: Old and New , 2008 .

[55]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.