FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

The importance of incorporating ethics and legal compliance into machine-assisted decision-making is broadly recognized. Further, several lines of recent work have argued that critical opportunities for improving data quality and representativeness, controlling for bias, and allowing humans to oversee and impact computational processes are missed if we do not consider the lifecycle stages upstream from model training and deployment. Yet, very little has been done to date to provide system-level support to data scientists who wish to develop and deploy responsible machine learning methods. We aim to fill this gap and present FairPrep, a design and evaluation framework for fairness-enhancing interventions. FairPrep is based on a developer-centered design, and helps data scientists follow best practices in software engineering and machine learning. As part of our contribution, we identify shortcomings in existing empirical studies for analyzing fairness-enhancing interventions. We then show how FairPrep can be used to measure the impact of sound best practices, such as hyperparameter tuning and feature scaling. In particular, our results suggest that the high variability of the outcomes of fairness-enhancing interventions observed in previous studies is often an artifact of a lack of hyperparameter tuning. Further, we show that the choice of a data cleaning method can impact the effectiveness of fairness-enhancing interventions.

[1]  Suresh Venkatasubramanian,et al.  A comparative study of fairness-enhancing interventions in machine learning , 2018, FAT.

[2]  Suresh Venkatasubramanian,et al.  On the (im)possibility of fairness , 2016, ArXiv.

[3]  Blake Lemoine,et al.  Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Neoklis Polyzotis,et al.  Data Lifecycle Challenges in Production Machine Learning , 2018, SIGMOD Rec..

[6]  Manasi Vartak,et al.  ModelDB: a system for machine learning model management , 2016, HILDA '16.

[7]  Indre Zliobaite,et al.  Measuring discrimination in algorithmic decision making , 2017, Data Mining and Knowledge Discovery.

[8]  Nisheeth K. Vishnoi,et al.  Stable and Fair Classification , 2019, ICML.

[9]  Felix Bießmann,et al.  On Challenges in Machine Learning Model Management , 2018, IEEE Data Eng. Bull..

[10]  Miroslav Dudík,et al.  Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? , 2018, CHI.

[11]  Dan Suciu,et al.  Interventional Fairness: Causal Database Repair for Algorithmic Fairness , 2019, SIGMOD Conference.

[12]  Felix Bießmann,et al.  "Deep" Learning for Missing Value Imputationin Tables with Non-Numerical Data , 2018, CIKM.

[13]  Xiangliang Zhang,et al.  Decision Theory for Discrimination-Aware Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in Algorithmic Fairness , 2018, PERV.

[15]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[16]  Joost Kappelhof,et al.  Survey Research and the Quality of Survey Data Among Ethnic Minorities , 2017 .

[17]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[18]  Keith Kirkpatrick,et al.  It's not the algorithm, it's the data , 2017, Commun. ACM.

[19]  Gerhard Weikum,et al.  Fides: Towards a Platform for Responsible Data Science , 2017, SSDBM.

[20]  Indrăź źLiobaităź,et al.  Measuring discrimination in algorithmic decision making , 2017 .

[21]  Aws Albarghouthi,et al.  Fairness-Aware Programming , 2019, FAT.

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[23]  Mickael Guedj,et al.  A Comparison of Six Methods for Missing Data Imputation , 2015 .

[24]  Roxana Geambasu,et al.  FairTest: Discovering Unwarranted Associations in Data-Driven Applications , 2015, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[25]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[26]  Ansgar Koene,et al.  IEEE P7003™ standard for algorithmic bias considerations: work in progress paper , 2018 .

[27]  Rachel K. E. Bellamy,et al.  AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias , 2018, ArXiv.

[28]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[29]  Tim Kraska,et al.  Slice Finder: Automated Data Slicing for Model Validation , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[30]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[31]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[32]  Sebastian Schelter,et al.  Automatically Tracking Metadata and Provenance of Machine Learning Experiments , 2017 .

[33]  Rod Johnson,et al.  Professional Java Development with the Spring Framework , 2005 .

[34]  Jeffrey F. Naughton,et al.  Model Selection Management Systems: The Next Frontier of Advanced Analytics , 2016, SGMD.

[35]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[36]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[37]  Yuriy Brun,et al.  Fairness testing: testing software for discrimination , 2017, ESEC/SIGSOFT FSE.

[38]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.