Detecting Bias in Black-Box Models Using Transparent Model Distillation.

Black-box risk scoring models permeate our lives, yet are typically proprietary and opaque. We propose a transparent model distillation approach to detect bias in such models. Model distillation was originally designed to distill knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. We add a third restriction - transparency. In this paper we use data sets that contain two labels to train on: the risk score predicted by a black-box model, as well as the actual outcome the risk score was intended to predict. This allows us to compare models that predict each label. For a particular class of student models - interpretable tree additive models with pairwise interactions (GA2Ms) - we provide confidence intervals for the difference between the risk score and actual outcome models. This presents a new method for detecting bias in black-box risk scores by assessing if contributions of protected features to the risk score are statistically different from their contributions to the actual outcome.

[1]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[2]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[3]  Edward S. Neukrug,et al.  Essentials of Testing and Assessment: A Practical Guide for Counselors, Social Workers, and Psychologists , 2005 .

[4]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[5]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[6]  Manuel Lingo,et al.  Discriminatory Power - An Obsolete Validation Criterion? , 2008 .

[7]  Joseph Sexton,et al.  Standard errors for bagged and random forest estimators , 2009, Comput. Stat. Data Anal..

[8]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[9]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[10]  Giles Hooker,et al.  The computerized adaptive diagnostic test for major depressive disorder (CAD-MDD): a screening tool for depression. , 2013, The Journal of clinical psychiatry.

[11]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[12]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[14]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[15]  Anderson Ara,et al.  Classification methods applied to credit scoring: A systematic review and overall comparison , 2016, 1602.02137.

[16]  Richard D Riley,et al.  External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges , 2016, BMJ.

[17]  Lalana Kagal,et al.  Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models , 2016, ArXiv.

[18]  G. Hooker,et al.  Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests , 2014, J. Mach. Learn. Res..

[19]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[20]  Justin M. Rao,et al.  Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2016 .

[21]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[22]  Roxana Geambasu,et al.  FairTest: Discovering Unwarranted Associations in Data-Driven Applications , 2015, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[23]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[24]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[25]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[26]  Alexandra Chouldechova,et al.  Fairer and more accurate, but for whom? , 2017, ArXiv.

[27]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[28]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[29]  Albert Gordo,et al.  Transparent Model Distillation , 2018, ArXiv.

[30]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.