论文信息 - WheaCha: A Method for Explaining the Predictions of Code Summarization Models

WheaCha: A Method for Explaining the Predictions of Code Summarization Models

The last decade has witnessed a rapid advance in learning-based program analysis tools. While the black-box nature of these systems may produce powerful predictions, it cannot be directly explained, posing a threat to the continuing application of machine learning technology in programming language research. Recently, attribution methods have emerged as a popular approach to interpreting model predictions based on the relevance of input features. Although their feature importance ranking can provide insights of how models arrive at a prediction from a raw input, they do not give a clear-cut definition of the key features models use for the prediction. In this paper, we present a new method, called WheaCha, for explaining the predictions of code summarization models, a broad class of code models that predict the name of a method given its body. Although WheaCha employs the same mechanism of tracing model predictions back to the input features, it differs from all existing attribution methods in crucial ways. Specifically, WheaCha divides an input method into “wheat” (i.e., the defining features that are the reason for which models predict the label that they predict) and the rest “chaff ” (i.e., the auxiliary features that may help models predict the label that they predict) for any prediction of a learned code summarization model. We realizeWheaCha in a tool, HuoYan, and use it to explain four prominent code summarization models: code2vec, code2seq, sequence GNN, and extreme summarizer. Results show (1) HuoYan is efficient — taking on average under fifteen seconds to compute the wheat for an input method in an end-to-end fashion (i.e., including the model prediction time); (2) the wheat that all models use to predict input methods is made of simple syntactic or even lexical properties (i.e., identifier names), nevertheless, they are often intuitive and well-aligned to the features humans would have used for inferring the name of the same methods; (3) some of the most noteworthy attribution methods do not precisely capture the wheat that models use for prediction. Finally, we show wheat naturally lends itself to a novel, powerful attack mechanism on code summarization models. An extensive evaluation shows our attack mechanism significantly outperforms the state-of-the-art in both non-targeted and targeted attacks, and more importantly, in many cases renders a popular defense technique — adversarial training — almost non-existent. Our work opens up this exciting, new direction of studying what models have learned from source code.

Yu Wang | Linzhang Wang | Ke Wang

[1] Anna Shcherbina,et al. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[2] Westley Weimer,et al. Type error feedback via analytic program repair , 2020, PLDI.

[3] Max Welling,et al. Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[4] Denys Poshyvanyk,et al. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[5] Aditya Kanade,et al. Learning and Evaluating Contextual Embedding of Source Code , 2019, ICML.

[6] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[7] Le Song,et al. Learning Loop Invariants for Program Verification , 2018, NeurIPS.

[8] Zhendong Su,et al. HDD: hierarchical delta debugging , 2006, ICSE.

[9] Aditya Thakur,et al. Poster: Path-Based Function Embeddings , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[11] Marc Brockschmidt,et al. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , 2019, ArXiv.

[12] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[13] Omer Levy,et al. code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[14] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[15] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[17] Uri Alon,et al. Adversarial examples for models of code , 2020, Proc. ACM Program. Lang..

[18] Suman Jana,et al. Learning nonlinear loop invariants with gated continuous logic networks , 2020, PLDI.

[19] Yufan Zhuang,et al. Probing model signal-awareness via prediction-preserving input minimization , 2020, ESEC/SIGSOFT FSE.

[20] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.