ComFormer: Code Comment Generation via Transformer and Fusion Method-based Hybrid Code Representation

Developers often write low-quality code comments due to the lack of programming experience, which can reduce the efficiency of developers' program comprehension. Therefore, developers hope that code comment generation tools can be developed to illustrate the functionality and purpose of the code. Recently, researchers mainly model this problem as the neural machine translation problem and tend to use deep learning-based methods. In this study, we propose a novel method ComFormer based on Transformer and fusion method-based hybrid code presentation. Moreover, to alleviate OOV (out-of-vocabulary) problem and speed up model training, we further utilize the Byte-BPE algorithm to split identifiers and Sim_SBT method to perform AST Traversal. We compare ComFormer with seven state-of-the-art baselines from code comment generation and neural machine translation domains. Comparison results show the competitiveness of ComFormer in terms of three performance measures. Moreover, we perform a human study to verify that ComFormer can generate high-quality comments.

[1]  Shuai Lu,et al.  Summarizing Source Code with Transferred API Knowledge , 2018, IJCAI.

[2]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Christoph Treude,et al.  Automated Query Reformulation for Efficient Search Based on Query Logs From Stack Overflow , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[5]  Philip S. Yu,et al.  Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention , 2022, IEEE Transactions on Software Engineering.

[6]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[7]  Collin McMillan,et al.  A Neural Model for Generating Natural Language Summaries of Program Subroutines , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[8]  Collin McMillan,et al.  Improved Code Summarization via a Graph Neural Network , 2020, 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC).

[9]  Baishakhi Ray,et al.  A Transformer-based Approach for Source Code Summarization , 2020, ACL.

[10]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[11]  Taolue Chen,et al.  Augmenting Java method comments generation with context information based on neural networks , 2019, J. Syst. Softw..

[12]  Kenny Q. Zhu,et al.  Automatic Generation of Text Descriptive Comments for Code Blocks , 2018, AAAI.

[13]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[14]  Jonathan I. Maletic,et al.  Using stereotypes in the automatic generation of natural language summaries for C++ methods , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[15]  Collin McMillan,et al.  Improved Automatic Summarization of Subroutines via Attention to File Context , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[16]  Meng Wang,et al.  Hierarchical LSTM for Sign Language Translation , 2018, AAAI.

[17]  Minghui Zhou,et al.  A Neural Framework for Retrieval and Summarization of Source Code , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  Qiang Fan,et al.  A Neural-Network based Code Summarization Approach by Using Source Code and its Call Dependencies , 2019, Internetware.

[19]  Jörg Tiedemann,et al.  An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[20]  Jeffrey C. Carver,et al.  Evaluating source code summarization techniques: Replication and expansion , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[21]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[22]  Hailong Sun,et al.  Retrieval-based Neural Source Code Summarization , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[23]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[24]  Samy Bengio,et al.  Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[25]  Lin Tan,et al.  CloCom: Mining existing source code for automatic comment generation , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[26]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[27]  David Lo,et al.  Deep code comment generation with hybrid lexical and syntactical information , 2019, Empirical Software Engineering.

[28]  Douglas Kramer,et al.  API documentation from source code comments: a case study of Javadoc , 1999, SIGDOC '99.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[31]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[32]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[33]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[34]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[35]  Hao He,et al.  Understanding source code comments at large-scale , 2019, ESEC/SIGSOFT FSE.

[36]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[37]  Ming Li,et al.  Code Attention: Translating Code to Comments by Exploiting Domain Features , 2017, ArXiv.

[38]  Philip S. Yu,et al.  Improving Automatic Source Code Summarization via Deep Reinforcement Learning , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[39]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[40]  Wei Ye,et al.  Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning , 2020, WWW.

[41]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[42]  David Lo,et al.  Assessing the Generalizability of Code2vec Token Embeddings , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[44]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[45]  Collin McMillan,et al.  An Eye-Tracking Study of Java Programmers and Application to Source Code Summarization , 2015, IEEE Transactions on Software Engineering.

[46]  Markus Freitag,et al.  Beam Search Strategies for Neural Machine Translation , 2017, NMT@ACL.

[47]  Zhenchang Xing,et al.  Measuring Program Comprehension: A Large-Scale Field Study with Professionals , 2018, IEEE Transactions on Software Engineering.

[48]  Kyunghyun Cho,et al.  Neural Machine Translation with Byte-Level Subwords , 2020, AAAI.

[49]  Lori L. Pollock,et al.  Generating Parameter Comments and Integrating with Method Summaries , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[50]  Xin Xia,et al.  Code Generation as a Dual Task of Code Summarization , 2019, NeurIPS.

[51]  Collin McMillan,et al.  Automatic Source Code Summarization of Context for Java Methods , 2016, IEEE Transactions on Software Engineering.

[52]  Xiaoran Wang,et al.  Automatically generating natural language descriptions for object-related statement sequences , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[53]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[54]  Andrian Marcus,et al.  Supporting program comprehension with source code summarization , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.