Multi-Differential Fairness Auditor for Black Box Classifiers

Machine learning algorithms are increasingly involved in sensitive decision-making process with adversarial implications on individuals. This paper presents mdfa, an approach that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier and demonstrate that individuals identified as African-American with little criminal history are three-times more likely to be considered at high risk of violent recidivism than similar individuals but not African-American.

[1]  Suresh Venkatasubramanian,et al.  A comparative study of fairness-enhancing interventions in machine learning , 2018, FAT.

[2]  Guy N. Rothblum,et al.  Fairness Through Computationally-Bounded Awareness , 2018, NeurIPS.

[3]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[4]  Alexandra Chouldechova,et al.  The Frontiers of Fairness in Machine Learning , 2018, ArXiv.

[5]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[6]  Shafi Goldwasser,et al.  Proceedings of the 3rd Innovations in Theoretical Computer Science Conference , 2012 .

[7]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[8]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[9]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[10]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[11]  Chris Russell,et al.  Efficient Search for Diverse Coherent Explanations , 2019, FAT.

[12]  James R. Foulds,et al.  An Intersectional Definition of Fairness , 2018, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[13]  Seth Neel,et al.  An Empirical Study of Rich Subgroup Fairness for Machine Learning , 2018, FAT.

[14]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[15]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[16]  Matt J. Kusner,et al.  Causal Reasoning for Algorithmic Fairness , 2018, ArXiv.

[17]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[18]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[19]  Guy N. Rothblum,et al.  Calibration for the (Computationally-Identifiable) Masses , 2017, ArXiv.

[20]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[21]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[22]  Caterina Calsamiglia Decentralizing Equality of Opportunity , 2009 .

[23]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[24]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[25]  Aaron Roth,et al.  Differentially Private Fair Learning , 2018, ICML.

[26]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.