Making Fair ML Software using Trustworthy Explanation

Machine learning software is being used in many applications (finance, hiring, admissions, criminal justice) having huge social impact. But sometimes the behavior of this software is biased and it shows discrimination based on some sensitive attributes such as sex, race etc. Prior works concentrated on finding and mitigating bias in ML models. A recent trend is using instance-based model-agnostic explanation methods such as LIME[36] to find out bias in the model prediction. Our work concentrates on finding shortcomings of current bias measures and explanation methods. We show how our proposed method based on K nearest neighbors can overcome those shortcomings and find the underlying bias of black box models. Our results are more trustworthy and helpful for the practitioners. Finally, We describe our future framework combining explanation and planning to build fair software.

[1]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[2]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[3]  Adrian Weller,et al.  You Shouldn't Trust Me: Learning Models Which Conceal Unfairness From Multiple Explanation Methods , 2020, SafeAI@AAAI.

[4]  Diptikalyan Saha,et al.  Black box fairness testing of machine learning models , 2019, ESEC/SIGSOFT FSE.

[5]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[6]  Guy N. Rothblum,et al.  Probably Approximately Metric-Fair Learning , 2018, ICML.

[7]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[8]  Yuriy Brun,et al.  Themis: automatically testing software for discrimination , 2018, ESEC/SIGSOFT FSE.

[9]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[10]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[11]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[12]  Krishna P. Gummadi,et al.  iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[13]  J. Roemer,et al.  Equality of Opportunity , 2013 .

[14]  Blake Lemoine,et al.  Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[15]  Yuriy Brun,et al.  Fairness testing: testing software for discrimination , 2017, ESEC/SIGSOFT FSE.

[16]  Tim Menzies,et al.  Fairway: a way to build fair ML software , 2020, ESEC/SIGSOFT FSE.

[17]  Christopher Jung,et al.  Online Learning with an Unknown Fairness Metric , 2018, NeurIPS.

[18]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[19]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[20]  Rachel K. E. Bellamy,et al.  AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias , 2018, ArXiv.

[21]  Xiangliang Zhang,et al.  Exploiting reject option in classification for social discrimination control , 2018, Inf. Sci..

[22]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[23]  Reuben Binns,et al.  On the apparent conflict between individual and group fairness , 2019, FAT*.

[24]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[25]  Sudipta Chattopadhyay,et al.  Automated Directed Fairness Testing , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[27]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[28]  Nathalie A. Smuha The EU Approach to Ethics Guidelines for Trustworthy Artificial Intelligence , 2019, Computer Law Review International.

[29]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.