论文信息 - Detecting Bias in Black-Box Models Using Transparent Model Distillation. - 字舞流文

Detecting Bias in Black-Box Models Using Transparent Model Distillation.

Black-box risk scoring models permeate our lives, yet are typically proprietary and opaque. We propose a transparent model distillation approach to detect bias in such models. Model distillation was originally designed to distill knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. We add a third restriction - transparency. In this paper we use data sets that contain two labels to train on: the risk score predicted by a black-box model, as well as the actual outcome the risk score was intended to predict. This allows us to compare models that predict each label. For a particular class of student models - interpretable tree additive models with pairwise interactions (GA2Ms) - we provide confidence intervals for the difference between the risk score and actual outcome models. This presents a new method for detecting bias in black-box risk scores by assessing if contributions of protected features to the risk score are statistically different from their contributions to the actual outcome.

R. Caruana | G. Hooker | S. Tan | Yin Lou

[1] Stephen E. Fienberg,et al. The Comparison and Evaluation of Forecasters. , 1983 .

[2] Jude W. Shavlik,et al. in Advances in Neural Information Processing , 1996 .

[3] Edward S. Neukrug,et al. Essentials of Testing and Assessment: A Practical Guide for Counselors, Social Workers, and Psychologists , 2005 .

[4] Rich Caruana,et al. Predicting good probabilities with supervised learning , 2005, ICML.

[5] Rich Caruana,et al. Model compression , 2006, KDD '06.

[6] Manuel Lingo,et al. Discriminatory Power - An Obsolete Validation Criterion? , 2008 .

[7] Joseph Sexton,et al. Standard errors for bagged and random forest estimators , 2009, Comput. Stat. Data Anal..

[8] Johannes Gehrke,et al. Intelligible models for classification and regression , 2012, KDD.

[9] Johannes Gehrke,et al. Accurate intelligible models with pairwise interactions , 2013, KDD.

[10] Giles Hooker,et al. The computerized adaptive diagnostic test for major depressive disorder (CAD-MDD): a screening tool for depression. , 2013, The Journal of clinical psychiatry.

[11] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[12] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13] Carlos Eduardo Scheidegger,et al. Certifying and Removing Disparate Impact , 2014, KDD.

[14] Johannes Gehrke,et al. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[15] Anderson Ara,et al. Classification methods applied to credit scoring: A systematic review and overall comparison , 2016, 1602.02137.

[16] Richard D Riley,et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges , 2016, BMJ.

[17] Lalana Kagal,et al. Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models , 2016, ArXiv.

[18] G. Hooker,et al. Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests , 2014, J. Mach. Learn. Res..

[19] Yair Zick,et al. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[20] Justin M. Rao,et al. Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2016 .

[21] Suresh Venkatasubramanian,et al. Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[22] Roxana Geambasu,et al. FairTest: Discovering Unwarranted Associations in Data-Driven Applications , 2015, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[23] Chris Russell,et al. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[24] Avi Feller,et al. Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[25] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[26] Alexandra Chouldechova,et al. Fairer and more accurate, but for whom? , 2017, ArXiv.

[27] Krishna P. Gummadi,et al. Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[28] Jon M. Kleinberg,et al. Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[29] Albert Gordo,et al. Transparent Model Distillation , 2018, ArXiv.

[30] S. Athey,et al. Generalized random forests , 2016, The Annals of Statistics.