Algorithmic Fairness Revisited

We study fairness in algorithmic decision making, with the goal of proposing formal and robust definitions and measures of an algorithm’s bias towards sensitive user features. The main contribution of this work is a statistical framework for reasoning about equity in algorithmic decisions, while also considering various constraints on users’ and the algorithm vendors’ utilities. We first revisit previous notions of fairness from the literature, that are based on different measures of the dependency between sensitive features and algorithmic decisions. We illustrate several limitations of these measures, such as their failure to generalize to non-binary sensitive features or algorithm outputs, and we propose a more general and robust fairness measure based on mutual information, which has received little attention so far. In particular, we show that our fairness measure produces significantly better characterizations of the statistical significance of an algorithm’s bias, compared to the notion of statistical parity introduced by Dwork et al. [17, 73]. We further discuss the inadequacy of previously considered fairness measures, in their inability to detect large-scale discriminatory practices, that are due to algorithms with small biases being applied on a global scale. We instigate the discussion on statistical hypothesis tests, that, in spite of being standard tools in legal practices, have received little attention in the context of algorithmic fairness. In this regard, we present another advantage of mutual information, compared to other proposed fairness measures, in that it is directly linked to a popular statistical goodness-of-fit test known as the G-test. We further reason about situations, where the absolute parity of an algorithm may be prohibitively at odds with the utility of an algorithm’s vendor or its users. We generalize our fairness definitions to include various utilitarian constraints, with a particular focus on discriminatory practices that are considered acceptable because of genuine business necessity requirements. We describe a framework mirroring legal practices, that allows businesses to discriminate users based on genuine task-specific qualification levels, in order to guarantee the organization’s well-being. Finally, we consider practical issues related to the detection of algorithmic biases from empirical data, and propose a generic methodology, relying on cluster analysis techniques and robust statistical testing, to reason about discrimination in different subsets of the user population. We evaluate our methods on small artificial datasets, as well as on the Berkeley Graduate Admissions and Adult Census datasets, and illustrate how our techniques can either discover discrimination hidden in particular user subsets, or reveal potential businessnecessary requirements that may account for an observed algorithmic bias.

[1]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[2]  Grey Giddins Statistics , 2016, The Journal of hand surgery, European volume.

[3]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[4]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[5]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[6]  Michael Carl Tschantz,et al.  A Methodology for Information Flow Experiments , 2014, 2015 IEEE 28th Computer Security Foundations Symposium.

[7]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination , 2014, ArXiv.

[8]  Roxana Geambasu,et al.  XRay: Enhancing the Web's Transparency with Differential Correlation , 2014, USENIX Security Symposium.

[9]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[10]  Muriel Médard,et al.  From the Information Bottleneck to the Privacy Funnel , 2014, 2014 IEEE Information Theory Workshop (ITW 2014).

[11]  Anupam Datta,et al.  Privacy through Accountability: A Computer Science Perspective , 2014, ICDCIT.

[12]  D. Borsboom,et al.  Simpson's paradox in psychological science: a practical guide , 2013, Front. Psychol..

[13]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[15]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.

[16]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[17]  J. Hoey The Two-Way Likelihood Ratio (G) Test and Comparison to Two-Way Chi Squared Test , 2012, 1206.4881.

[18]  Michael Carl Tschantz,et al.  Formalizing and Enforcing Purpose Restrictions in Privacy Policies , 2012, 2012 IEEE Symposium on Security and Privacy.

[19]  Gábor E. Tusnády,et al.  Information divergence is more χ2-distributed than the χ2-statistics , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[20]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[21]  Bart Baesens,et al.  Performance of classification models from a user perspective , 2011, Decis. Support Syst..

[22]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[23]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[26]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[27]  Franco Turini,et al.  DCUBE: discrimination discovery in databases , 2010, SIGMOD Conference.

[28]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[29]  Franco Turini,et al.  Integrating induction and deduction for finding evidence of discrimination , 2009, Artificial Intelligence and Law.

[30]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[31]  Toon Calders,et al.  Classifying without discriminating , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[32]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[33]  Jennifer L. Peresie Toward a Coherent Test for Disparate Impact Discrimination , 2009 .

[34]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[35]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[36]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[37]  Helen Nissenbaum,et al.  Privacy and contextual integrity: framework and applications , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[38]  H. Abdi The Bonferonni and Šidák Corrections for Multiple Comparisons , 2006 .

[39]  I. Kojadinovic On the use of mutual information in data analysis : an overview , 2005 .

[40]  Deniz Erdogmus,et al.  Lower and Upper Bounds for Misclassification Probability Based on Renyi's Information , 2004, J. VLSI Signal Process..

[41]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[43]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[44]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[45]  Tristin K. Green Discrimination in Workplace Dynamics: Toward a Structural Account of Disparate Treatment Theory , 2003 .

[46]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[47]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[48]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[49]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[50]  Susan Grover The Business Necessity Defense in Disparate Impact Discrimination Cases , 1996 .

[51]  Andrew C. Spiropoulos Defining the Business Necessity Defense to the Disparate Impact Cause of Action: Finding the Golden Mean , 1995 .

[52]  N. Merhav,et al.  Relations Between Entropy and Error Probability , 1993, Proceedings. IEEE International Symposium on Information Theory.

[53]  Joseph L. Gastwirth,et al.  Statistical Reasoning in the Legal Setting , 1992 .

[54]  Daniel L. Rubinfeld,et al.  Econometrics in the Courtroom , 1985 .

[55]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[56]  G. Sher,et al.  What makes a Lottery Fair , 1980 .

[57]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[58]  Elaine W. Shoben Differential Pass-Fail Rates in Employment Testing: Statistical Proof Under Title VII , 1978 .

[59]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .