论文信息 - Partial order: Finding Consensus among Uncertain Feature Attributions

Partial order: Finding Consensus among Uncertain Feature Attributions

Post-hoc feature importance is progressively being employed to explain decisions of complex machine learning models. Yet in practice, reruns of the training algorithm and/or the explainer can result in contradicting statements of feature importance, henceforth reducing trust in those techniques. A possible avenue to address this issue is to develop strategies to aggregate diverse explanations about feature importance. While the arithmetic mean, which yields a total order, has been advanced, we introduce an alternative: the consensus among multiple models, which results in partial orders. The two aggregation strategies are compared using Integrated Gradients and Shapley values on two regression datasets, and we show that a large portion of the information provided by the mean aggregation is not supported by the consensus of each individual model, raising suspicion on the trustworthiness of this practice.

[1] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[2] Umang Bhatt,et al. Effects of Uncertainty on the Quality of Feature Importance Explanations , 2021 .

[3] Rafael Izbicki,et al. MeLIME: Meaningful Local Explanation for Machine Learning Models , 2020, ArXiv.

[4] Pascal Sturmfels,et al. Improving performance of deep learning models with axiomatic attribution priors and expected gradients , 2020, Nature Machine Intelligence.

[5] Eduardo Figueiredo,et al. Understanding machine learning software defect predictions , 2020, Automated Software Engineering.

[6] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[7] Gopi Krishnan Rajbahadur,et al. Towards a consistent interpretation of AIOps models , 2022, ACM Trans. Softw. Eng. Methodol..

[8] Hoa Khanh Dam,et al. An Empirical Study of Model-Agnostic Techniques for Defect Prediction Models , 2020, IEEE Transactions on Software Engineering.

[9] Cynthia Rudin,et al. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , 2019, J. Mach. Learn. Res..

[10] Sameer Singh,et al. Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[11] Ankur Taly,et al. The Explanation Game: Explaining Machine Learning Models Using Shapley Values , 2020, CD-MAKE.

[12] Federico Chesani,et al. Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models , 2020, ArXiv.

[13] Adrian Weller,et al. Evaluating and Aggregating Feature-based Model Explanations , 2020, IJCAI.

[14] Himabindu Lakkaraju,et al. How Much Should I Trust You? Modeling Uncertainty of Black Box Explanations , 2020, ArXiv.

[15] Erik Strumbelj,et al. Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[16] Giles Hooker,et al. S-LIME: Stabilized-LIME for Model Explanation , 2021, Knowledge Discovery and Data Mining.

[17] L. Shapley. A Value for n-person Games , 1988 .