Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention

Code summarization (aka comment generation) provides a high-level natural language description of the function performed by code, which can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, the state-of-the-art approaches follow an encoder-decoder framework which encodes source code into a hidden space and later decodes it into a natural language space. Such approaches suffer from the following drawbacks: (a) they are mainly input by representing code as a sequence of tokens while ignoring code hierarchy; (b) most of the encoders only input simple features (e.g., tokens) while ignoring the features that can help capture the correlations between comments and code; (c) the decoders are typically trained to predict subsequent words by maximizing the likelihood of subsequent ground truth words, while in real world, they are excepted to generate the entire word sequence from scratch. As a result, such drawbacks lead to inferior and inconsistent comment generation accuracy. To address the above limitations, this paper presents a new code summarization approach using hierarchical attention network by incorporating multiple code features, including type-augmented abstract syntax trees and program control flows. Such features, along with plain code sequences, are injected into a deep reinforcement learning (DRL) framework (e.g., actor-critic network) for comment generation. Our approach assigns weights (pays “attention”) to tokens and statements when constructing the code representation to reflect the hierarchical code structure under different contexts regarding code features (e.g., control flows and abstract syntax trees). Our reinforcement learning mechanism further strengthens the prediction results through the actor network and the critic network, where the actor network provides the confidence of predicting subsequent words based on the current state, and the critic network computes the reward values of all the possible extensions of the current state to provide global guidance for explorations. Eventually, we employ an advantage reward to train both networks and conduct a set of experiments on a real-world dataset. The experimental results demonstrate that our approach outperforms the baselines by around 22 to 45 percent in BLEU-1 and outperforms the state-of-the-art approaches by around 5 to 60 percent in terms of S-BLEU and C-BLEU.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Minghui Zhou,et al.  A Neural Framework for Retrieval and Summarization of Source Code , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Philip S. Yu,et al.  Improving Automatic Source Code Summarization via Deep Reinforcement Learning , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Yang Liu,et al.  graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[5]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[6]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[7]  L. D. Moura,et al.  Clone detection using abstract syntax trees , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[8]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[9]  Naren Ramakrishnan,et al.  Deep Reinforcement Learning for Sequence-to-Sequence Models , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[10]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[11]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[12]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[13]  Daniel Tarlow,et al.  Structured Generative Models of Natural Source Code , 2014, ICML.

[14]  Cristina V. Lopes,et al.  From Query to Usable Code: An Analysis of Stack Overflow Code Snippets , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[15]  Tomoki Toda,et al.  Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Farokh B. Bastani,et al.  A Framework for IoT-Based Monitoring and Diagnosis of Manufacturing Systems , 2017, 2017 IEEE Symposium on Service-Oriented System Engineering (SOSE).

[17]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[18]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[19]  Dichao Hu,et al.  An Introductory Survey on Attention Mechanisms in NLP Problems , 2018, IntelliSys.

[20]  Lingming Zhang,et al.  Practical program repair via bytecode mutation , 2018, ISSTA.

[21]  Sufyan bin Uzayr GitHub , 2022, Mastering Git.

[22]  Leonidas J. Guibas,et al.  Learning Program Embeddings to Propagate Feedback on Student Code , 2015, ICML.

[23]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[24]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[25]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[26]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[27]  Rico Sennrich,et al.  A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation , 2017, IJCNLP.

[28]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[29]  Yuqun Zhang,et al.  Simulee: Detecting CUDA Synchronization Bugs via Memory-Access Modeling , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[32]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[33]  Shuai Lu,et al.  Summarizing Source Code with Transferred API Knowledge , 2018, IJCAI.

[34]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[35]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[36]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[37]  Sarfraz Khurshid,et al.  An Empirical Study of Boosting Spectrum-Based Fault Localization via PageRank , 2021, IEEE Transactions on Software Engineering.

[38]  Lei Li,et al.  On Tree-Based Neural Sentence Modeling , 2018, EMNLP.

[39]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[40]  Sarfraz Khurshid,et al.  An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[41]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[42]  Zhenchang Xing,et al.  Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[44]  Yoshua Bengio,et al.  Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[45]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[46]  M A Branch,et al.  Software maintenance management , 1986 .

[47]  Wei Li,et al.  DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization , 2019, ISSTA.

[48]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Lingming Zhang,et al.  Hybrid Regression Test Selection , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[50]  Xiao Liu,et al.  SmartVM: a SLA-aware microservice deployment framework , 2018, World Wide Web.

[51]  Meng Wang,et al.  Do Pseudo Test Suites Lead to Inflated Correlation in Measuring Test Effectiveness? , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[52]  Yuqun Zhang,et al.  Characterizing and Detecting CUDA Program Bugs , 2019, ArXiv.

[53]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[54]  David Lo,et al.  Deep code comment generation with hybrid lexical and syntactical information , 2019, Empirical Software Engineering.

[55]  Wei Li,et al.  DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems , 2018, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[56]  Tjalling Haije,et al.  Automatic Comment Generation using a Neural Translation Model , 2016 .

[57]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[58]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[59]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[60]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[61]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[62]  William W. Cohen,et al.  Natural Language Models for Predicting Programming Comments , 2013, ACL.

[63]  Tommi A Pirinen,et al.  Neural and rule-based Finnish NLP models—expectations, experiments and experiences , 2019, Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages.

[64]  Michael D. Ernst,et al.  Automated documentation inference to explain failed tests , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[65]  Truyen Tran,et al.  A deep language model for software code , 2016, FSE 2016.

[66]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[67]  Sarfraz Khurshid,et al.  DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[68]  Hal Daumé,et al.  Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback , 2017, EMNLP.

[69]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[70]  Jiachen Zhang,et al.  Precise Condition Synthesis for Program Repair , 2016, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[71]  Abhik Roychoudhury,et al.  Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[72]  Xiaochen Li,et al.  Query Expansion Based on Crowd Knowledge for Code Search , 2016, IEEE Transactions on Services Computing.

[73]  Mira Kajko-Mattsson,et al.  A Survey of Documentation Practice within Corrective Maintenance , 2004, Empirical Software Engineering.

[74]  Andrew D. Gordon,et al.  Bimodal Modelling of Source Code and Natural Language , 2015, ICML.

[75]  Devin Chollak,et al.  Bugram: Bug detection with n-gram language models , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[76]  Huan Sun,et al.  CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning , 2019, WWW.

[77]  Hiroaki Yoshida,et al.  Anti-patterns in search-based program repair , 2016, SIGSOFT FSE.

[78]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[79]  Ning Zhang,et al.  Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Anna Korhonen,et al.  Isomorphic Transfer of Syntactic Structures in Cross-Lingual NLP , 2018, ACL.

[81]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[82]  Yuqun Zhang,et al.  Automating CUDA Synchronization via Program Transformation , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[83]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[84]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[85]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.

[86]  Pekka Abrahamsson,et al.  Agile Software Development Methods: Review and Analysis , 2017, ArXiv.

[87]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[88]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[89]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[90]  Anh Tuan Nguyen,et al.  Automatic Categorization with Deep Neural Network for Open-Source Java Projects , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[91]  Guodong Zhou,et al.  Modeling Source Syntax for Neural Machine Translation , 2017, ACL.

[92]  Mert Kilickaya,et al.  Re-evaluating Automatic Metrics for Image Captioning , 2016, EACL.

[93]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[94]  Yang Liu,et al.  Large-Scale Analysis of Framework-Specific Exceptions in Android Apps , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).