论文信息 - Human-in-the-loop Extraction of Interpretable Concepts in Deep Learning Models

Human-in-the-loop Extraction of Interpretable Concepts in Deep Learning Models

The interpretation of deep neural networks (DNNs) has become a key topic as more and more people apply them to solve various problems and making critical decisions. Concept-based explanations have recently become a popular approach for post-hoc interpretation of DNNs. However, identifying human-understandable visual concepts that affect model decisions is a challenging task that is not easily addressed with automatic approaches. We present a novel human-in-the-Ioop approach to generate user-defined concepts for model interpretation and diagnostics. Central to our proposal is the use of active learning, where human knowledge and feedback are combined to train a concept extractor with very little human labeling effort. We integrate this process into an interactive system, ConceptExtract. Through two case studies, we show how our approach helps analyze model behavior and extract human-friendly concepts for different machine learning tasks and datasets and how to use these concepts to understand the predictions, compare model performance and make suggestions for model refinement. Quantitative experiments show that our active learning approach can accurately extract meaningful visual concepts. More importantly, by identifying visual concepts that negatively affect model performance, we develop the corresponding data augmentation strategy that consistently improves model performance.

[1] Cynthia Rudin,et al. This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[2] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[3] Been Kim,et al. Sanity Checks for Saliency Maps , 2018, NeurIPS.

[4] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Amit Dhurandhar,et al. TED: Teaching AI to Explain its Decisions , 2018, AIES.

[6] HeerJeffrey,et al. D3 Data-Driven Documents , 2011 .

[7] Duen Horng Chau,et al. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations , 2019, IEEE Transactions on Visualization and Computer Graphics.

[8] Ruimao Zhang,et al. Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[9] Ion Stoica,et al. Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules , 2019, ICML.

[10] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Mariusz Bojarski,et al. VisualBackProp: Efficient Visualization of CNNs for Autonomous Driving , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Andrea Vedaldi,et al. Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] Huamin Qu,et al. Interpretable and Steerable Sequence Learning via Prototypes , 2019, KDD.

[15] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Quanshi Zhang,et al. Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Frank Keller,et al. Extreme Clicking for Efficient Object Annotation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Mark Craven,et al. An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[19] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[20] Jimeng Sun,et al. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[21] Sebastian Grottel,et al. Visualizations of Deep Neural Networks in Computer Vision: A Survey , 2017 .

[22] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[23] Stefano Soatto,et al. Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[24] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[25] Oliver Deussen,et al. Towards an Interpretable Latent Space – An Intuitive Comparison of Autoencoders with Variational Autoencoders , 2018 .

[26] Miroslav Dudík,et al. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? , 2018, CHI.

[27] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[28] In So Kweon,et al. Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] James Zou,et al. Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[30] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[31] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Kate Saenko,et al. RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[33] Silvio Savarese,et al. Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[34] Eric Horvitz,et al. Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff , 2019, AAAI.

[35] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[36] Jeffrey Heer,et al. Latent Space Cartography: Visual Analysis of Vector Space Embeddings , 2019, Comput. Graph. Forum.

[37] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[38] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[39] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[40] Giacomo Torlai,et al. Latent Space Purification via Neural Density Operators. , 2018, Physical review letters.

[41] Sameer Singh,et al. How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods , 2019, ArXiv.

[42] Martin Wattenberg,et al. Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making , 2019, CHI.

[43] Minsuk Kahng,et al. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers , 2018, IEEE Transactions on Visualization and Computer Graphics.

[44] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[45] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[46] Joshua B. Tenenbaum,et al. One shot learning of simple visual concepts , 2011, CogSci.

[47] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Sebastian Ramos,et al. Lost and Found: detecting small road hazards for self-driving vehicles , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[50] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51] Quanshi Zhang,et al. Interpreting CNNs via Decision Trees , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.