Avoiding Disparate Impact with Counterfactual Distributions

When a classification model is used to make predictions on individuals, it may be undesirable or illegal for the performance of the model to change with respect to a sensitive attribute such as race or gender. In this paper, we aim to evaluate and mitigate such disparities in model performance through a distributional approach. Given a black-box classifier that performs unevenly across sensitive groups, we consider a counterfactual distribution of input variables that minimizes the performance gap. We characterize properties of counterfactual distributions for common fairness criteria. We then present novel machinery to efficiently recover counterfactual distributions given a sample of points from its target population. We describe how counterfactual distributions can be used to avoid discrimination between protected groups by: (i) identifying proxy variables to omit in training; and (ii) building a preprocessor that can mitigate discrimination. We validate both use cases through experiments on a real-world dataset.

[1]  Hao Wang Avoiding Discrimination with Counterfactual Distributions , 2018 .

[2]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[3]  S. Rachev,et al.  Mass transportation problems , 1998 .

[4]  Yuriy Brun,et al.  Fairness testing: testing software for discrimination , 2017, ESEC/SIGSOFT FSE.

[5]  N. Diakopoulos Algorithmic Accountability Reporting: On the Investigation of Black Boxes , 2014 .

[6]  Sharad Goel,et al.  Fast Threshold Tests for Detecting Discrimination , 2017, AISTATS.

[7]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[8]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[9]  Hao Wang,et al.  On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[10]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[11]  Sharad Goel,et al.  The Problem of Infra-Marginality in Outcome Tests for Discrimination , 2016, 1607.05376.

[12]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[13]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[14]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[15]  Alexandra Chouldechova,et al.  Does mitigating ML's impact disparity require treatment disparity? , 2017, NeurIPS.

[16]  Adam Tauman Kalai,et al.  Decoupled Classifiers for Group-Fair and Efficient Machine Learning , 2017, FAT.

[17]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[18]  Krishna P. Gummadi,et al.  From Parity to Preference-based Notions of Fairness in Classification , 2017, NIPS.

[19]  Indră Liobaită Measuring discrimination in algorithmic decision making , 2017 .

[20]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[21]  Salvatore Ruggieri,et al.  A multidisciplinary survey on discrimination analysis , 2013, The Knowledge Engineering Review.