FAE: A Fairness-Aware Ensemble Framework

Automated decision making based on big data and machine learning (ML) algorithms can result in discriminatory decisions against certain protected groups defined upon personal data like gender, race, sexual orientation etc. Such algorithms designed to discover patterns in big data might not only pick up any encoded societal biases in the training data, but even worse, they might reinforce such biases resulting in more severe discrimination. The majority of thus far proposed fairness-aware machine learning approaches focus solely on the pre-, in- or post-processing steps of the machine learning process, that is, input data, learning algorithms or derived models, respectively. However, the fairness problem cannot be isolated to a single step of the ML process. Rather, discrimination is often a result of complex interactions between big data and algorithms, and therefore, a more holistic approach is required.The proposed FAE (Fairness-Aware Ensemble) framework combines fairness-related interventions at both pre-and post-processing steps of the data analysis process. In the pre-processing step, we tackle the problems of under-representation of the protected group (group imbalance) and of class-imbalance by generating balanced training samples. In the post-processing step, we tackle the problem of class overlapping by shifting the decision boundary in the direction of fairness.

[1]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Franco Turini,et al.  Measuring Discrimination in Socially-Sensitive Decision Records , 2009, SDM.

[3]  Toon Calders,et al.  Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures , 2013, Discrimination and Privacy in the Information Society.

[4]  K. Lum,et al.  To predict and serve? , 2016 .

[5]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[6]  Adam Tauman Kalai,et al.  Decoupled Classifiers for Group-Fair and Efficient Machine Learning , 2017, FAT.

[7]  Yiannis Kompatsiaris,et al.  Adaptive Sensitive Reweighting to Mitigate Bias in Fairness-aware Classification , 2018, WWW.

[8]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[9]  Chris. Drummond,et al.  C 4 . 5 , Class Imbalance , and Cost Sensitivity : Why Under-Sampling beats OverSampling , 2003 .

[10]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[11]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[12]  Benjamin Fish,et al.  A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[13]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[14]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[15]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[16]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[17]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[18]  Eirini Ntoutsi,et al.  AdaFair: Cumulative Fairness Adaptive Boosting , 2019, CIKM.

[19]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[20]  Eirini Ntoutsi,et al.  Dealing with Bias via Data Augmentation in Supervised Learning Scenarios , 2018 .