An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation

Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionality of this code snippet and how to use it. To tackle this limitation, this study empirically investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Our intuition is based on the facts that (1) the code and its pairwise comment are used during the pre-training process of LLMs to build the semantic connection between the natural language and programming language, and (2) comments in the real-world projects, which are collected for the pre-training, usually contain different developers' intents. We thus postulate that the LLMs can already understand the code from different perspectives after the pre-training. Indeed, experiments on two large-scale datasets demonstrate the rationale of our insights: by adopting the in-context learning paradigm and giving adequate prompts to the LLM (e.g., providing it with ten or more examples), the LLM can significantly outperform a state-of-the-art supervised learning approach on generating comments with multiple intents. Results also show that customized strategies for constructing the prompts and post-processing strategies for reranking the results can both boost the LLM's performances, which shed light on future research directions for using LLMs to achieve comment generation.

[1]  Dragomir R. Radev,et al.  LEVER: Learning to Verify Language-to-Code Generation with Execution , 2023, ArXiv.

[2]  Lin Shi,et al.  Developer-Intent Driven Code Comment Generation , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).

[3]  Piji Li,et al.  Efficient transformer with code token learner for code clone detection , 2022, J. Syst. Softw..

[4]  Arghavan Moradi Dakhel,et al.  GitHub Copilot AI pair programmer: Asset or Liability? , 2022, J. Syst. Softw..

[5]  David N. Palacio,et al.  Using Transfer Learning for Code-Related Tasks , 2022, IEEE Transactions on Software Engineering.

[6]  Michael R. Lyu,et al.  Code Structure–Guided Transformer for Source Code Summarization , 2021, ACM Trans. Softw. Eng. Methodol..

[7]  Shanshan Li,et al.  deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search , 2021, ACM Trans. Softw. Eng. Methodol..

[8]  Sida I. Wang,et al.  Coder Reviewer Reranking for Code Generation , 2022, ICML.

[9]  Lin Shi,et al.  Automatic Comment Generation via Multi-Pass Deliberation , 2022, ASE.

[10]  Ling Li,et al.  AUGER: automatically generating review comments with pre-training models , 2022, ESEC/SIGSOFT FSE.

[11]  Cuiyun Gao,et al.  No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence , 2022, ESEC/SIGSOFT FSE.

[12]  Weizhu Chen,et al.  CodeT: Code Generation with Generated Tests , 2022, ICLR.

[13]  Julian Aron Prenner,et al.  Can OpenAI's Codex Fix Bugs?: An evaluation on QuixBugs , 2022, 2022 IEEE/ACM International Workshop on Automated Program Repair (APR).

[14]  A. Eghbali,et al.  CrystalBLEU: Precisely and Efficiently Measuring the Similarity of Code , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[15]  Sida I. Wang,et al.  Natural Language to Code Translation with Execution , 2022, EMNLP.

[16]  M. Lewis,et al.  Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.

[17]  Philip S. Yu,et al.  Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention , 2022, IEEE Transactions on Software Engineering.

[18]  Jonathan Berant,et al.  Learning To Retrieve Prompts for In-Context Learning , 2021, NAACL.

[19]  Ramesh Karri,et al.  Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions , 2021, 2022 IEEE Symposium on Security and Privacy (SP).

[20]  Shin Hwei Tan,et al.  Improving automatically generated code from Codex via Automated Program Repair , 2022, ArXiv.

[21]  Iordanis Fostiropoulos,et al.  GN-Transformer: Fusing Sequence and Graph Representation for Improved Code Summarization , 2021, ArXiv.

[22]  Zhi Jin,et al.  EditSum: A Retrieve-and-Edit Framework for Source Code Summarization , 2021, 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Yue Wang,et al.  CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , 2021, EMNLP.

[24]  Venera Arnaoudova,et al.  Reassessing automatic evaluation metrics for code summarization tasks , 2021, ESEC/SIGSOFT FSE.

[25]  Aakash Bansal,et al.  Ensemble Models for Neural Source Code Summarization of Subroutines , 2021, 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[26]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[27]  Dongmei Zhang,et al.  CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network , 2021, ArXiv.

[28]  Aakash Bansal,et al.  Project-Level Encoding for Neural Source Code Summarization of Subroutines , 2021, 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC).

[29]  Shikun Zhang,et al.  Exploiting Method Names to Improve Code Summarization: A Deliberation Multi-Task Learning Approach , 2021, 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC).

[30]  Chen Lin,et al.  Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting , 2021, 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC).

[31]  Yaroslav Golubev,et al.  Multi-threshold token-based code clone detection , 2021, 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

[32]  David Lo,et al.  Why My Code Summarization Model Does Not Work , 2021, ACM Trans. Softw. Eng. Methodol..

[33]  Ming Zhou,et al.  GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.

[34]  Hailong Sun,et al.  Retrieval-based Neural Source Code Summarization , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[35]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[36]  Zijian Li,et al.  TAG : Type Auxiliary Guiding for Code Comment Generation , 2020, ACL.

[37]  Baishakhi Ray,et al.  A Transformer-based Approach for Source Code Summarization , 2020, ACL.

[38]  Collin McMillan,et al.  Improved Automatic Summarization of Subroutines via Attention to File Context , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[39]  Wei Ye,et al.  Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning , 2020, WWW.

[40]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[41]  Zhi Jin,et al.  Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree , 2020, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[42]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[43]  Bolin Wei,et al.  Retrieve and Refine: Exemplar-Based Neural Comment Generation , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[44]  Xin Xia,et al.  Code Generation as a Dual Task of Code Summarization , 2019, NeurIPS.

[45]  Marc Brockschmidt,et al.  CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , 2019, ArXiv.

[46]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[47]  David Lo,et al.  Deep code comment generation with hybrid lexical and syntactical information , 2019, Empirical Software Engineering.

[48]  Collin McMillan,et al.  A Neural Model for Generating Natural Language Summaries of Program Subroutines , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[49]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[50]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[51]  Philip S. Yu,et al.  Improving Automatic Source Code Summarization via Deep Reinforcement Learning , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[52]  Shuai Lu,et al.  Summarizing Source Code with Transferred API Knowledge , 2018, IJCAI.

[53]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[54]  Koushik Sen,et al.  DeepBugs: a learning approach to name-based bug detection , 2018, Proc. ACM Program. Lang..

[55]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[56]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[57]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[58]  Lin Tan,et al.  CloCom: Mining existing source code for automatic comment generation , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[59]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[60]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[61]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[62]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[63]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[64]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[65]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[66]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[67]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[68]  S. Niwattanakul,et al.  Using of Jaccard Coefficient for Keywords Similarity , 2022 .