Improved Code Summarization via a Graph Neural Network

Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of advances in neural network and AI technologies. In general, source code summarization techniques use the source code as input and outputs a natural language description. Yet a strong consensus is developing that using structural information as input leads to improved performance. The first approaches to use structural information flattened the AST into a sequence. Recently, more complex approaches based on random AST paths or graph neural networks have improved on the models using flattened ASTs. However, the literature still does not describe the using a graph neural network together with source code sequence as separate inputs to a model. Therefore, in this paper, we present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries. We evaluate our technique using a data set of 2.1 million Java method-comment pairs and show improvement over four baseline techniques, two from the software engineering literature, and two from machine learning literature.

[1]  Collin McMillan,et al.  A Neural Model for Generating Natural Language Summaries of Program Subroutines , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[2]  Zachary Eberhart,et al.  Adapting Neural Text Classification for Improved Software Categorization , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[3]  Yansong Feng,et al.  Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks , 2018, ArXiv.

[4]  Anneliese Amschler Andrews,et al.  Program Comprehension During Software Maintenance and Evolution , 1995, Computer.

[5]  Mo Yu,et al.  Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model , 2018, EMNLP.

[6]  Rainer Koschke,et al.  How do professional developers comprehend software? , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[7]  Arie van Deursen,et al.  A Systematic Survey of Program Comprehension through Dynamic Analysis , 2008, IEEE Transactions on Software Engineering.

[8]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[9]  Premkumar T. Devanbu,et al.  Are deep neural networks the best choice for modeling source code? , 2017, ESEC/SIGSOFT FSE.

[10]  He Jiang,et al.  Summarizing Software Artifacts: A Literature Review , 2016, Journal of Computer Science and Technology.

[11]  Christian Bauckhage,et al.  Informed Machine Learning - Towards a Taxonomy of Explicit Integration of Knowledge into Machine Learning , 2019, ArXiv.

[12]  Klaus-Robert Müller,et al.  "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[13]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[14]  Jeffrey C. Carver,et al.  Evaluating source code summarization techniques: Replication and expansion , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[15]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[16]  Kenny Q. Zhu,et al.  Automatic Generation of Text Descriptive Comments for Code Blocks , 2018, AAAI.

[17]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[18]  Tao Xie,et al.  An Empirical Study on Evolution of API Documentation , 2011, FASE.

[19]  Zelong Zhao,et al.  Learning to Generate Comments for API-Based Code Snippets , 2017 .

[20]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[21]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[22]  Noah A. Smith,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016, ACL 2016.

[23]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[24]  David W. Binkley,et al.  Source Code Analysis: A Road Map , 2007, Future of Software Engineering (FOSE '07).

[25]  Ribana Roscher,et al.  Explainable Machine Learning for Scientific Insights and Discoveries , 2019, IEEE Access.

[26]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[27]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Mira Kajko-Mattsson,et al.  A Survey of Documentation Practice within Corrective Maintenance , 2004, Empirical Software Engineering.

[29]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[30]  Collin McMillan,et al.  An Eye-Tracking Study of Java Programmers and Application to Source Code Summarization , 2015, IEEE Transactions on Software Engineering.

[31]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[32]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[33]  Shujian Huang,et al.  Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder , 2017, ACL.

[34]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[35]  Philip S. Yu,et al.  Improving Automatic Source Code Summarization via Deep Reinforcement Learning , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Derek Doran,et al.  What Does Explainable AI Really Mean? A New Conceptualization of Perspectives , 2017, CEx@AI*IA.

[37]  Marc Brockschmidt,et al.  Structured Neural Summarization , 2018, ICLR.

[38]  Douglas Kramer,et al.  API documentation from source code comments: a case study of Javadoc , 1999, SIGDOC '99.

[39]  Hailong Sun,et al.  A Survey of Automatic Generation of Source Code Comments: Algorithms and Techniques , 2019, IEEE Access.

[40]  Jairo Aponte,et al.  On the Analysis of Human and Automatic Summaries of Source Code , 2012, CLEI Electron. J..

[41]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[42]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[43]  Mohammed J. Zaki,et al.  Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation , 2019, ICLR.

[44]  Timothy Lethbridge,et al.  The relevance of software documentation, tools and technologies: a survey , 2002, DocEng '02.

[45]  Andrian Marcus,et al.  On the Use of Domain Terms in Source Code , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[46]  Collin McMillan,et al.  Automated feature discovery via sentence selection and source code summarization , 2016, J. Softw. Evol. Process..

[47]  Collin McMillan,et al.  Recommendations for Datasets for Source Code Summarization , 2019, NAACL.

[48]  Karl J. Ottenstein,et al.  The program dependence graph in a software development environment , 1984 .

[49]  Yutaka Matsuo,et al.  A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes , 2017, ACL.

[50]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[51]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[52]  Lori L. Pollock,et al.  Automatically mining software-based, semantically-similar words from comment-code mappings , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[53]  Collin McMillan,et al.  Automatic Source Code Summarization of Context for Java Methods , 2016, IEEE Transactions on Software Engineering.

[54]  Jonathan I. Maletic,et al.  Lightweight Transformation and Fact Extraction with the srcML Toolkit , 2011, 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation.

[55]  Shuai Lu,et al.  Summarizing Source Code with Transferred API Knowledge , 2018, IJCAI.

[56]  LiuCheng,et al.  Automated feature discovery via sentence selection and source code summarization , 2016 .

[57]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.