Conscientious Classification: A Data Scientist's Guide to Discrimination-Aware Classification

Recent research has helped to cultivate growing awareness that machine-learning systems fueled by big data can create or exacerbate troubling disparities in society. Much of this research comes from outside of the practicing data science community, leaving its members with little concrete guidance to proactively address these concerns. This article introduces issues of discrimination to the data science community on its own terms. In it, we tour the familiar data-mining process while providing a taxonomy of common practices that have the potential to produce unintended discrimination. We also survey how discrimination is commonly measured, and suggest how familiar development processes can be augmented to mitigate systems' discriminatory potential. We advocate that data scientists should be intentional about modeling and reducing discriminatory outcomes. Without doing so, their efforts will result in perpetuating any systemic discrimination that may exist, but under a misleading veil of data-driven objectivity.

[1]  Cathy O'Neil,et al.  Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2016, Vikalpa: The Journal for Decision Makers.

[2]  KamiranFaisal,et al.  Data preprocessing techniques for classification without discrimination , 2012 .

[3]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[4]  Francesco Bonchi,et al.  Algorithmic Bias: From Discrimination Discovery to Fairness-aware Data Mining , 2016, KDD.

[5]  Indre Zliobaite,et al.  On the relation between accuracy and fairness in binary classification , 2015, ArXiv.

[6]  Salvatore Ruggieri,et al.  A multidisciplinary survey on discrimination analysis , 2013, The Knowledge Engineering Review.

[7]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[8]  Toon Calders,et al.  Controlling Attribute Effect in Linear Regression , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[10]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[11]  Lincoln Quillian Measuring Racial Discrimination , 2006 .

[12]  John Langford,et al.  Online Importance Weight Aware Updates , 2010, UAI.

[13]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[14]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[15]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[16]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[17]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[18]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[19]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[20]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[21]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[22]  Roland G. Fryer An Empirical Analysis of Racial Differences in Police Use of Force , 2016, Journal of Political Economy.

[23]  Foster Provost,et al.  Evaluating and Optimizing Online Advertising: Forget the Click, but There Are Good Proxies , 2015, Big Data.

[24]  Faisal Kamiran,et al.  Quantifying explainable discrimination and removing illegal discrimination in automated decision making , 2012, Knowledge and Information Systems.

[25]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[26]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[27]  Indre Zliobaite,et al.  A survey on measuring indirect discrimination in machine learning , 2015, ArXiv.

[28]  Apurv Jain Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2017, Business Economics.

[29]  J. Yinger,et al.  Measuring racial discrimination with fair housing audits: caught in the Act. , 1986 .