Partial order: Finding Consensus among Uncertain Feature Attributions

Post-hoc feature importance is progressively being employed to explain decisions of complex machine learning models. Yet in practice, reruns of the training algorithm and/or the explainer can result in contradicting statements of feature importance, henceforth reducing trust in those techniques. A possible avenue to address this issue is to develop strategies to aggregate diverse explanations about feature importance. While the arithmetic mean, which yields a total order, has been advanced, we introduce an alternative: the consensus among multiple models, which results in partial orders. The two aggregation strategies are compared using Integrated Gradients and Shapley values on two regression datasets, and we show that a large portion of the information provided by the mean aggregation is not supported by the consensus of each individual model, raising suspicion on the trustworthiness of this practice.

[1]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[2]  Umang Bhatt,et al.  Effects of Uncertainty on the Quality of Feature Importance Explanations , 2021 .

[3]  Rafael Izbicki,et al.  MeLIME: Meaningful Local Explanation for Machine Learning Models , 2020, ArXiv.

[4]  Pascal Sturmfels,et al.  Improving performance of deep learning models with axiomatic attribution priors and expected gradients , 2020, Nature Machine Intelligence.

[5]  Eduardo Figueiredo,et al.  Understanding machine learning software defect predictions , 2020, Automated Software Engineering.

[6]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[7]  Gopi Krishnan Rajbahadur,et al.  Towards a consistent interpretation of AIOps models , 2022, ACM Trans. Softw. Eng. Methodol..

[8]  Hoa Khanh Dam,et al.  An Empirical Study of Model-Agnostic Techniques for Defect Prediction Models , 2020, IEEE Transactions on Software Engineering.

[9]  Cynthia Rudin,et al.  All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , 2019, J. Mach. Learn. Res..

[10]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[11]  Ankur Taly,et al.  The Explanation Game: Explaining Machine Learning Models Using Shapley Values , 2020, CD-MAKE.

[12]  Federico Chesani,et al.  Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models , 2020, ArXiv.

[13]  Adrian Weller,et al.  Evaluating and Aggregating Feature-based Model Explanations , 2020, IJCAI.

[14]  Himabindu Lakkaraju,et al.  How Much Should I Trust You? Modeling Uncertainty of Black Box Explanations , 2020, ArXiv.

[15]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[16]  Giles Hooker,et al.  S-LIME: Stabilized-LIME for Model Explanation , 2021, Knowledge Discovery and Data Mining.

[17]  L. Shapley A Value for n-person Games , 1988 .