Developing a Fidelity Evaluation Approach for Interpretable Machine Learning

Although modern machine learning and deep learning methods allow for complex and in-depth data analytics, the predictive models generated by these methods are often highly complex, and lack transparency. Explainable AI (XAI) methods are used to improve the interpretability of these complex models, and in doing so improve transparency. However, the inherent fitness of these explainable methods can be hard to evaluate. In particular, methods to evaluate the fidelity of the explanation to the underlying black box require further development, especially for tabular data. In this paper, we (a) propose a three phase approach to developing an evaluation method; (b) adapt an existing evaluation method primarily for image and text data to evaluate models trained on tabular data; and (c) evaluate two popular explainable methods using this evaluation method. Our evaluations suggest that the internal mechanism of the underlying predictive model, the internal mechanism of the explainable method used and model and data complexity all affect explanation fidelity. Given that explanation fidelity is so sensitive to context and tools and data used, we could not clearly identify any specific explainable method as being superior to another.

[1]  Dong Nguyen,et al.  Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.

[2]  Jianlong Zhou,et al.  Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics , 2021, Electronics.

[3]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[5]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[6]  Fan Yang,et al.  On Attribution of Recurrent Neural Network Predictions via Additive Decomposition , 2019, WWW.

[7]  Franco Turini,et al.  Factual and Counterfactual Explanations for Black Box Decision Making , 2019, IEEE Intelligent Systems.

[8]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Fan Yang,et al.  Evaluating Explanation Without Ground Truth in Interpretable Machine Learning , 2019, ArXiv.

[10]  Yunfeng Zhang,et al.  Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making , 2020, FAT*.

[11]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[12]  Blackbox AI — State Regulation or Corporate Responsibility? , 2019, Digitale Welt.

[13]  Huamin Qu,et al.  RuleMatrix: Visualizing and Understanding Classifiers with Rules , 2018, IEEE Transactions on Visualization and Computer Graphics.

[14]  Chun Ouyang,et al.  Evaluating Fidelity of Explainable Methods for Predictive Process Analytics , 2021, CAiSE Forum.

[15]  Sharath M. Shankaranarayana,et al.  ALIME: Autoencoder Based Approach for Local Interpretability , 2019, IDEAL.

[16]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[17]  Christos Makris,et al.  Model-Agnostic Interpretability with Shapley Values , 2019, 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA).

[18]  Jan A. Kors,et al.  The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies , 2021, J. Biomed. Informatics.

[19]  Xiuyi Fan,et al.  Evaluating the Correctness of Explainable AI Algorithms for Classification , 2021, ArXiv.

[20]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[21]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[22]  Mykola Pechenizkiy,et al.  A Human-Grounded Evaluation of SHAP for Alert Processing , 2019, ArXiv.

[23]  Federico Chesani,et al.  Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models , 2020, ArXiv.

[24]  Cynthia Rudin,et al.  Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.