Metrizing Fairness

We study supervised learning problems for predicting properties of individuals who belong to one of two demographic groups, and we seek predictors that are fair according to statistical parity. This means that the distributions of the predictions within the two groups should be close with respect to the Kolmogorov distance, and fairness is achieved by penalizing the dissimilarity of these two distributions in the objective function of the learning problem. In this paper, we showcase conceptual and computational benefits of measuring unfairness with integral probability metrics (IPMs) other than the Kolmogorov distance. Conceptually, we show that the generator of any IPM can be interpreted as a family of utility functions and that unfairness with respect to this IPM arises if individuals in the two demographic groups have diverging expected utilities. We also prove that the unfairness-regularized prediction loss admits unbiased gradient estimators if unfairness is measured by the squared $\mathcal L^2$-distance or by a squared maximum mean discrepancy. In this case, the fair learning problem is susceptible to efficient stochastic gradient descent (SGD) algorithms. Numerical experiments on real data show that these SGD algorithms outperform state-of-the-art methods for fair learning in that they achieve superior accuracy-unfairness trade-offs -- sometimes orders of magnitude faster. Finally, we identify conditions under which statistical parity can improve prediction accuracy.

[1]  Paul Vicol,et al.  Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies , 2021, ICML.

[2]  Yuekai Sun,et al.  Post-processing for Individual Fairness , 2021, NeurIPS.

[3]  Xia Hu,et al.  Fairness via Representation Neutralization , 2021, NeurIPS.

[4]  Tim Räz,et al.  Group Fairness: Independence Revisited , 2021, FAccT.

[5]  Soumyadip Ghosh,et al.  Unbiased Gradient Estimation for Distributionally Robust Learning , 2020, ArXiv.

[6]  Steven Euijong Whang,et al.  FairBatch: Batch Selection for Model Fairness , 2020, ICLR.

[7]  Yuekai Sun,et al.  Does enforcing fairness mitigate biases caused by subpopulation shift? , 2020, NeurIPS.

[8]  Nigam H. Shah,et al.  An Empirical Characterization of Fair Machine Learning For Clinical Risk Prediction , 2020, J. Biomed. Informatics.

[9]  Viet Anh Nguyen,et al.  A Distributionally Robust Approach to Fair Classification , 2020, ArXiv.

[10]  Ulrike von Luxburg,et al.  Too Relaxed to Be Fair , 2020, ICML.

[11]  Kush R. Varshney,et al.  Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing , 2020, ICML.

[12]  Volkan Cevher,et al.  On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems , 2020, NeurIPS.

[13]  Luca Oneto,et al.  Fair Regression with Wasserstein Barycenters , 2020, NeurIPS.

[14]  Alexandra Chouldechova,et al.  A snapshot of the frontiers of fairness in machine learning , 2020, Commun. ACM.

[15]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[16]  Avrim Blum,et al.  Recovering from Biased Data: Can Fairness Constraints Improve Accuracy? , 2019, FORC.

[17]  Xin Yao,et al.  Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating , 2019, ArXiv.

[18]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[19]  Silvia Chiappa,et al.  Wasserstein Fair Classification , 2019, UAI.

[20]  Yuekai Sun,et al.  Training individually fair ML models with sensitive subspace robustness , 2019, ICLR.

[21]  Miroslav Dudík,et al.  Fair Regression: Quantitative Definitions and Reduction-based Algorithms , 2019, ICML.

[22]  Aaron Roth,et al.  Average Individual Fairness: Algorithms, Generalization and Experiments , 2019, NeurIPS.

[23]  Lu Zhang,et al.  On Convexity and Bounds of Fairness-aware Classification , 2019, WWW.

[24]  R. C. Williamson,et al.  Fairness risk measures , 2019, ICML.

[25]  Ankur Taly,et al.  Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[26]  Zijun Zhang,et al.  Improved Adam Optimizer for Deep Neural Networks , 2018, 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS).

[27]  Lu Zhang,et al.  FairGAN: Fairness-aware Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[28]  Elias Bareinboim,et al.  Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.

[29]  Carlo Luschi,et al.  Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.

[30]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.

[31]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[32]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[33]  Stephen P. Boyd,et al.  A Rewriting System for Convex Optimization Problems , 2017, J. Control. Decis..

[34]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[35]  Seth Neel,et al.  A Convex Framework for Fair Regression , 2017, ArXiv.

[36]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[37]  Ilya Shpitser,et al.  Fair Inference on Outcomes , 2017, AAAI.

[38]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[39]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[40]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[41]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[42]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[43]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[44]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[45]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[46]  Dean P. Foster,et al.  Impartial Predictive Modeling: Ensuring Fairness in Arbitrary Models , 2016 .

[47]  Benjamin Fish,et al.  A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[48]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[49]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[50]  A. Gorban,et al.  The Five Factor Model of personality and evaluation of drug consumption risk , 2015, 1506.06297.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[53]  Toon Calders,et al.  Controlling Attribute Effect in Linear Regression , 2013, 2013 IEEE 13th International Conference on Data Mining.

[54]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[55]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[56]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[57]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[58]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[59]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[60]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[61]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[62]  Toon Calders,et al.  Building Classifiers with Independency Constraints , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[63]  C. Villani Optimal Transport: Old and New , 2008 .

[64]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[65]  Teva J. Scheer Uniform Guidelines on Employee Selection Procedures , 2007 .

[66]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[67]  Alok Baveja,et al.  Computing , Artificial Intelligence and Information Technology A data-driven software tool for enabling cooperative information sharing among police departments , 2002 .

[68]  J. Friedman Stochastic gradient boosting , 2002 .

[69]  G. Shorack Probability for Statisticians , 2000 .

[70]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[71]  P. Glasserman Performance continuity and differentiability in Monte Carlo optimization , 1988, 1988 Winter Simulation Conference Proceedings.

[72]  H. Robbins A Stochastic Approximation Method , 1951 .

[73]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[74]  Luca Oneto,et al.  Exploiting MMD and Sinkhorn Divergences for Fair and Transferable Representation Learning , 2020, NeurIPS.

[75]  Changho Suh,et al.  A Fair Classifier Using Kernel Density Estimation , 2020, NeurIPS.

[76]  Jean-Baptiste Tristan,et al.  Unlocking Fairness: a Trade-off Revisited , 2019, NeurIPS.

[77]  Krishna P. Gummadi,et al.  Fairness Constraints: A Flexible Approach for Fair Classification , 2019, J. Mach. Learn. Res..

[78]  Elias Bareinboim,et al.  Equality of Opportunity in Classification: A Causal Approach , 2018, NeurIPS.

[79]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[80]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[81]  Gilles Louppe,et al.  Independent consultant , 2013 .

[82]  Gert R. G. Lanckriet,et al.  On the empirical estimation of integral probability metrics , 2012 .

[83]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[84]  Florence Merlev THE EMPIRICAL DISTRIBUTION FUNCTION FOR DEPENDENT VARIABLES: ASYMPTOTIC AND NONASYMPTOTIC RESULTS IN L p , 2007 .

[85]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[86]  Maria L. Rizzo,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[87]  A. Martin-Löf On the composition of elementary errors , 1994 .

[88]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[89]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .