Learning Fair Classifiers

Automated data-driven decision systems are ubiquitous across a wide variety of online services, from online social networking and e-commerce to e-government. These systems rely on complex learning methods and vast amounts of data to optimize the service functionality, satisfaction of the end user and profitability. However, there is a growing concern that these automated decisions can lead to user discrimination, even in the absence of intent, leading to a lack of fairness, i.e., their outcomes have a disproportionally large adverse impact on particular groups of people sharing one or more sensitive attributes (e.g., race, sex). In this paper, we introduce a flexible mechanism to design fair classifiers in a principled manner. Then, we instantiate this mechanism on three well-known classifiers -- logistic regression, hinge loss and linear and nonlinear support vector machines. Experiments on both synthetic and real-world data show that our mechanism allows for a fine-grained control of the level of fairness, often at a minimal cost in terms of accuracy, and it provides more flexibility than alternatives.

[1]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[2]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[3]  Dan A. Biddle Adverse Impact and Test Validation: A Practitioner's Guide to Valid and Defensible Employment Testing , 2005 .

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[6]  Toon Calders,et al.  Classifying without discriminating , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[7]  Franco Turini,et al.  Measuring Discrimination in Socially-Sensitive Decision Records , 2009, SDM.

[8]  Married,et al.  Classification with no discrimination by preferential sampling , 2010 .

[9]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[10]  Toon Calders,et al.  Handling Conditional Discrimination , 2011, 2011 IEEE 11th International Conference on Data Mining.

[11]  Josep Domingo-Ferrer,et al.  Rule Protection for Indirect Discrimination Prevention in Data Mining , 2011, MDAI.

[12]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[13]  Xiangliang Zhang,et al.  Decision Theory for Discrimination-Aware Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[15]  Toon Calders,et al.  Controlling Attribute Effect in Linear Regression , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[17]  Jun Sakuma,et al.  Efficiency Improvement of Neutrality-Enhanced Recommendation , 2013, Decisions@RecSys.

[18]  Salvatore Ruggieri,et al.  A multidisciplinary survey on discrimination analysis , 2013, The Knowledge Engineering Review.

[19]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[20]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[21]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[22]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .