Fairness-Aware Classifier with Prejudice Remover Regularizer

With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.

[1]  Saharon Rosset,et al.  Leakage in data mining: formulation, detection, and avoidance , 2011, TKDD.

[2]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[3]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[4]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[5]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[6]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[7]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[8]  Franco Turini,et al.  DCUBE: discrimination discovery in databases , 2010, SIGMOD Conference.

[9]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[10]  Kobbi Nissim,et al.  Private Data Analysis via Output Perturbation - A Rigorous Approach to Constructing Sanitizers and Privacy Preserving Algorithms , 2008, Privacy-Preserving Data Mining.

[11]  Suresh Venkatasubramanian Measures of Anonymity , 2008, Privacy-Preserving Data Mining.

[12]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Eli Pariser FILTER BUBBLE: Wie wir im Internet entmündigt werden , 2012 .

[14]  Toon Calders,et al.  Handling Conditional Discrimination , 2011, 2011 IEEE 11th International Conference on Data Mining.

[15]  Thomas Hofmann,et al.  Non-redundant data clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[16]  Jun Sakuma,et al.  Fairness-aware Learning through Regularization Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[17]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[18]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[19]  Franco Turini,et al.  Measuring Discrimination in Socially-Sensitive Decision Records , 2009, SDM.

[20]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[21]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[22]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[23]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[24]  Eli Pariser,et al.  The Filter Bubble: What the Internet Is Hiding from You , 2011 .