Bias In, Bias Out? Evaluating the Folk Wisdom

We evaluate the folk wisdom that algorithmic decision rules trained on data produced by biased human decision-makers necessarily reflect this bias. We consider a setting where training labels are only generated if a biased decision-maker takes a particular action, and so "biased" training data arise due to discriminatory selection into the training data. In our baseline model, the more biased the decision-maker is against a group, the more the algorithmic decision rule favors that group. We refer to this phenomenon as "bias reversal." We then clarify the conditions that give rise to bias reversal. Whether a prediction algorithm reverses or inherits bias depends critically on how the decision-maker affects the training data as well as the label used in training. We illustrate our main theoretical results in a simulation study applied to the New York City Stop, Question and Frisk dataset.

[1]  Sandra G. Mayson Bias In, Bias Out , 2018 .

[2]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[3]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[4]  K. Lum,et al.  To predict and serve? , 2016 .

[5]  Sebastian Ehrlichmann,et al.  The Economics of Discrimination , 2009 .

[6]  Matt Fredrikson,et al.  Proxy Discrimination∗ in Data-Driven Systems Theory and Experiments with Machine Learnt Programs , 2017 .

[7]  Danielle Li Expertise vs . Bias in Evaluation : Evidence from the NIH ∗ , 2013 .

[8]  H. F. Stone THE UNIVERSITY OF CHICAGO LAW REVIEW , 2015 .

[9]  Madeleine Udell,et al.  Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved , 2018, FAT.

[10]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[11]  Hiwot Adilow Stereotypes , 2012 .

[12]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[13]  Crystal S. Yang,et al.  Racial Bias in Bail Decisions , 2017, The Quarterly Journal of Economics.

[14]  J. Knowles,et al.  Racial Bias in Motor Vehicle Searches: Theory and Evidence , 1999, Journal of Political Economy.

[15]  Anupam Chander The Racist Algorithm , 2016 .

[16]  Alexandra Chouldechova,et al.  Does mitigating ML's impact disparity require treatment disparity? , 2017, NeurIPS.

[17]  Justin M. Rao,et al.  Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2016 .

[18]  Erez Shmueli,et al.  Algorithmic Fairness , 2020, ArXiv.

[19]  Toniann Pitassi,et al.  Fairness through Causal Awareness: Learning Causal Latent-Variable Models for Biased Data , 2018, FAT.

[20]  Danielle Li Expertise versus Bias in Evaluation: Evidence from the NIH , 2017 .

[21]  Jon M. Kleinberg,et al.  Discrimination in the Age of Algorithms , 2018, SSRN Electronic Journal.

[22]  Jure Leskovec,et al.  The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables , 2017, KDD.

[23]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[24]  Andrew D. Selbst Disparate Impact in Big Data Policing , 2017 .

[25]  Jann Spiess,et al.  Big Data and Discrimination , 2018 .

[26]  Solon Barocas,et al.  Mitigating Bias in Algorithmic Employment Screening: Evaluating Claims and Practices , 2019, SSRN Electronic Journal.

[27]  Sampath Kannan,et al.  Downstream Effects of Affirmative Action , 2018, FAT.

[28]  Rebecca M. Blank,et al.  Race and gender in the labor market , 1999 .

[29]  Sendhil Mullainathan,et al.  Does Machine Learning Automate Moral Hazard and Error? , 2017, The American economic review.

[30]  Alexandra Chouldechova,et al.  Learning under selective labels in the presence of expert consistency , 2018, ArXiv.

[31]  Nathan Kallus,et al.  Residual Unfairness in Fair Machine Learning from Prejudiced Data , 2018, ICML.