Optimized Data Pre-Processing for Discrimination Prevention

Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the impact of limited sample size in accomplishing this objective, and apply two instances of the proposed optimization to datasets, including one on real-world criminal recidivism. The results demonstrate that all three criteria can be simultaneously achieved and also reveal interesting patterns of bias in American society.

[1]  Sara Hajian,et al.  Simultaneous Discrimination Prevention and Privacy Protection in Data Publishing and Mining , 2013, ArXiv.

[2]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[3]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[4]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[5]  Narayanan Unny Edakunni,et al.  Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications , 2013, J. Mach. Learn. Res..

[6]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[7]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[8]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Teva J. Scheer Uniform Guidelines on Employee Selection Procedures , 2007 .

[10]  Jun Sakuma,et al.  Fairness-aware Learning through Regularization Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[11]  D. Pollard A User's Guide to Measure Theoretic Probability by David Pollard , 2001 .

[12]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[13]  Benjamin Fish,et al.  A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[14]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[17]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[18]  Toon Calders,et al.  Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures , 2013, Discrimination and Privacy in the Information Society.

[19]  Zhe Zhang,et al.  Identifying Significant Predictive Bias in Classifiers , 2016, ArXiv.

[20]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[21]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[22]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  Salvatore Ruggieri,et al.  Using t-closeness anonymity to control for non-discrimination , 2015, Trans. Data Priv..

[25]  Franco Turini,et al.  A study of top-k measures for discrimination discovery , 2012, SAC '12.

[26]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[27]  Toon Calders,et al.  Handling Conditional Discrimination , 2011, 2011 IEEE 11th International Conference on Data Mining.

[28]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  M. Phil,et al.  A METHODOLOGY FOR DIRECT AND INDIRECT DISCRIMINATION PREVENTION IN DATA MINING , 2015 .

[30]  Suresh Venkatasubramanian,et al.  On the (im)possibility of fairness , 2016, ArXiv.