论文信息 - Avoiding Disparate Impact with Counterfactual Distributions

Avoiding Disparate Impact with Counterfactual Distributions

When a classification model is used to make predictions on individuals, it may be undesirable or illegal for the performance of the model to change with respect to a sensitive attribute such as race or gender. In this paper, we aim to evaluate and mitigate such disparities in model performance through a distributional approach. Given a black-box classifier that performs unevenly across sensitive groups, we consider a counterfactual distribution of input variables that minimizes the performance gap. We characterize properties of counterfactual distributions for common fairness criteria. We then present novel machinery to efficiently recover counterfactual distributions given a sample of points from its target population. We describe how counterfactual distributions can be used to avoid discrimination between protected groups by: (i) identifying proxy variables to omit in training; and (ii) building a preprocessor that can mitigate discrimination. We validate both use cases through experiments on a real-world dataset.

Hao Wang | Flavio P. Calmon

[1] Hao Wang. Avoiding Discrimination with Counterfactual Distributions , 2018 .

[2] Oluwasanmi Koyejo,et al. Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[3] S. Rachev,et al. Mass transportation problems , 1998 .

[4] Yuriy Brun,et al. Fairness testing: testing software for discrimination , 2017, ESEC/SIGSOFT FSE.

[5] N. Diakopoulos. Algorithmic Accountability Reporting: On the Investigation of Black Boxes , 2014 .

[6] Sharad Goel,et al. Fast Threshold Tests for Detecting Discrimination , 2017, AISTATS.

[7] Yair Zick,et al. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[8] Kush R. Varshney,et al. Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[9] Hao Wang,et al. On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[10] Carlos Eduardo Scheidegger,et al. Certifying and Removing Disparate Impact , 2014, KDD.

[11] Sharad Goel,et al. The Problem of Infra-Marginality in Outcome Tests for Discrimination , 2016, 1607.05376.