论文信息 - Promises and Pitfalls of Black-Box Concept Learning Models

Promises and Pitfalls of Black-Box Concept Learning Models

Machine learning models that incorporate concept learning as an intermediate step in their decision making process can match the performance of black-box predictive models while retaining the ability to explain outcomes in human understandable terms. However, we demonstrate that the concept representations learned by these models encode information beyond the pre-defined concepts, and that natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading. We describe the mechanism underlying the information leakage and suggest recourse for mitigating its effects.

[1] Geraint Rees,et al. Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[2] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Frédéric Jurie,et al. Semantic bottleneck for computer vision tasks , 2018, ACCV.

[4] James Zou,et al. Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[5] Richard S. Zemel,et al. Learning Latent Subspaces in Variational Autoencoders , 2018, NeurIPS.

[6] C. Rudin,et al. Concept whitening for interpretable image recognition , 2020, Nature Machine Intelligence.

[7] Shree K. Nayar,et al. Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8] Been Kim,et al. Concept Bottleneck Models , 2020, ICML.

[9] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[10] Bernt Schiele,et al. Interpretability Beyond Classification Output: Semantic Bottleneck Networks , 2019, ArXiv.