Fair Data Adaptation with Quantile Preservation

Fairness of classification and regression has received much attention recently and various, partially non-compatible, criteria have been proposed. The fairness criteria can be enforced for a given classifier or, alternatively, the data can be adapated to ensure that every classifier trained on the data will adhere to desired fairness criteria. We present a practical data adaption method based on quantile preservation in causal structural equation models. The data adaptation is based on a presumed counterfactual model for the data. While the counterfactual model itself cannot be verified experimentally, we show that certain population notions of fairness are still guaranteed even if the counterfactual model is misspecified. The precise nature of the fulfilled non-causal fairness notion (such as demographic parity, separation or sufficiency) depends on the structure of the underlying causal model and the choice of resolving variables. We describe an implementation of the proposed data adaptation procedure based on Random Forests and demonstrate its practical use on simulated and real-world data.

[1]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[2]  Ilya Shpitser,et al.  Fair Inference on Outcomes , 2017, AAAI.

[3]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[4]  Sampath Kannan,et al.  Downstream Effects of Affirmative Action , 2018, FAT.

[5]  Anca D. Dragan,et al.  The Social Cost of Strategic Classification , 2018, FAT.

[6]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[7]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[8]  T. Richardson Single World Intervention Graphs ( SWIGs ) : A Unification of the Counterfactual and Graphical Approaches to Causality , 2013 .

[9]  Esther Rolf,et al.  Delayed Impact of Fair Machine Learning , 2018, ICML.

[10]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[11]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[12]  Elias Bareinboim,et al.  Equality of Opportunity in Classification: A Causal Approach , 2018, NeurIPS.

[13]  Ilya Shpitser,et al.  Counterfactual Graphical Models for Longitudinal Mediation Analysis With Unobserved Confounding , 2012, Cogn. Sci..

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  P. Bickel,et al.  Sex Bias in Graduate Admissions: Data from Berkeley , 1975, Science.

[16]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[17]  Francine D. Blau,et al.  Understanding International Differences in the Gender Pay Gap , 2001, Journal of Labor Economics.

[18]  Benjamin Fish,et al.  A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[19]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[20]  Silvia Chiappa,et al.  Path-Specific Counterfactual Fairness , 2018, AAAI.

[21]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[22]  A. Dawid Causal Inference without Counterfactuals , 2000 .

[23]  J. A. Cuesta-Albertos,et al.  Optimal coupling of multivariate distributions and stochastic processes , 1993 .

[24]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[25]  COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[26]  Yiling Chen,et al.  A Short-term Intervention for Long-term Fairness in the Labor Market , 2017, WWW.

[27]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[28]  Judea Pearl,et al.  Complete Identification Methods for the Causal Hierarchy , 2008, J. Mach. Learn. Res..

[29]  R. Darlington,et al.  ANOTHER LOOK AT “CULTURAL FAIRNESS”1 , 1971 .

[30]  Jin Tian,et al.  A general identification condition for causal effects , 2002, AAAI/IAAI.

[31]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[32]  Rachel K. E. Bellamy,et al.  AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias , 2018, ArXiv.

[33]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[34]  F. Santambrogio Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , 2015 .

[35]  Illtyd Trethowan Causality , 1938 .

[36]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[37]  Alex J. Cannon,et al.  Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes , 2018, Stochastic Environmental Research and Risk Assessment.

[38]  G. Carlier,et al.  Vector Quantile Regression , 2014, 1406.4643.

[39]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[40]  Elias Bareinboim,et al.  Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.

[41]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[42]  J. Pearl The Logic of Counterfactuals in Causal Inference , 2000 .

[43]  J. J. Narraway,et al.  Probability machines , 1989, Microprocess. Microprogramming.

[44]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .