Fairwashing: the risk of rationalization

Black-box explanation is the problem of explaining how a machine learning model -- whose internal logic is hidden to the auditor and generally complex -- produces its outcomes. Current approaches for solving this problem include model explanation, outcome explanation as well as model inspection. While these techniques can be beneficial by providing interpretability, they can be used in a negative manner to perform fairwashing, which we define as promoting the false perception that a machine learning model respects some ethical values. In particular, we demonstrate that it is possible to systematically rationalize decisions taken by an unfair black-box model using the model explanation as well as the outcome explanation approaches with a given fairness metric. Our solution, LaundryML, is based on a regularized rule list enumeration algorithm whose objective is to search for fair rule lists approximating an unfair black-box model. We empirically evaluate our rationalization technique on black-box models trained on real-world datasets and show that one can obtain rule lists with high fidelity to the black-box model while being considerably less unfair at the same time.

[1]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[2]  Alex Pentland,et al.  Fair, Transparent, and Accountable Algorithmic Decision-making Processes , 2017, Philosophy & Technology.

[3]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[4]  Hong Shen,et al.  Mining Optimal Class Association Rule Set , 2001, PAKDD.

[5]  Hong Shen,et al.  Mining the optimal class association rule set , 2002, Knowl. Based Syst..

[6]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[7]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[8]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[9]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[10]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[11]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[12]  M. I. V. Eale,et al.  SLAVE TO THE ALGORITHM ? WHY A ‘ RIGHT TO AN EXPLANATION ’ IS PROBABLY NOT THE REMEDY YOU ARE LOOKING FOR , 2017 .

[13]  E. Lawler A PROCEDURE FOR COMPUTING THE K BEST SOLUTIONS TO DISCRETE OPTIMIZATION PROBLEMS AND ITS APPLICATION TO THE SHORTEST PATH PROBLEM , 1972 .

[14]  Toon Calders,et al.  Building Classifiers with Independency Constraints , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[15]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[16]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[17]  Naeem Siddiqi,et al.  Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring , 2005 .

[18]  Bettina Berendt,et al.  Exploring Discrimination: A User-centric Evaluation of Discrimination-Aware Data Mining , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[19]  Takanori Maehara,et al.  Enumerate Lasso Solutions for Feature Selection , 2017, AAAI.

[20]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[21]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[22]  Rachel K. E. Bellamy,et al.  AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias , 2018, ArXiv.

[23]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[24]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[25]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[26]  Cynthia Rudin,et al.  Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[27]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[28]  Satoshi Hara,et al.  Approximate and Exact Enumeration of Rule Models , 2018, AAAI.

[29]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[30]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[31]  Luciano Floridi,et al.  Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation , 2017 .

[32]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[33]  Julius Adebayo,et al.  FairML : ToolBox for diagnosing bias in predictive modeling , 2016 .