Automatic Code Summarization via ChatGPT: How Far Are We?

To support software developers in understanding and maintaining programs, various automatic code summarization techniques have been proposed to generate a concise natural language comment for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of natural language processing tasks. Among them, ChatGPT is the most popular one which has attracted wide attention from the software engineering community. However, it still remains unclear how ChatGPT performs in (automatic) code summarization. Therefore, in this paper, we focus on evaluating ChatGPT on a widely-used Python dataset called CSN-Python and comparing it with several state-of-the-art (SOTA) code summarization models. Specifically, we first explore an appropriate prompt to guide ChatGPT to generate in-distribution comments. Then, we use such a prompt to ask ChatGPT to generate comments for all code snippets in the CSN-Python test set. We adopt three widely-used metrics (including BLEU, METEOR, and ROUGE-L) to measure the quality of the comments generated by ChatGPT and SOTA models (including NCS, CodeBERT, and CodeT5). The experimental results show that in terms of BLEU and ROUGE-L, ChatGPT's code summarization performance is significantly worse than all three SOTA models. We also present some cases and discuss the advantages and disadvantages of ChatGPT in code summarization. Based on the findings, we outline several open challenges and opportunities in ChatGPT-based code summarization.

[1]  Jacques Klein,et al.  Is ChatGPT the Ultimate Programming Assistant - How far is it? , 2023, ArXiv.

[2]  Ming Wen,et al.  A study on Prompt Design, Advantages and Limitations of ChatGPT for Deep Learning Program Repair , 2023, ArXiv.

[3]  Ge Li,et al.  Self-collaboration Code Generation via ChatGPT , 2023, ArXiv.

[4]  Wei Cheng,et al.  Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization , 2023, ArXiv.

[5]  Michihiro Yasunaga,et al.  Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , 2023, EMNLP.

[6]  Dan Su,et al.  A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.

[7]  J. Petke,et al.  An Analysis of the Automatic Bug Fixing Performance of ChatGPT , 2023, 2023 IEEE/ACM International Workshop on Automated Program Repair (APR).

[8]  Zhaopeng Tu,et al.  Is ChatGPT A Good Translator? A Preliminary Study , 2023, ArXiv.

[9]  Chunrong Fang,et al.  An Extractive-and-Abstractive Framework for Source Code Summarization , 2022, ACM Trans. Softw. Eng. Methodol..

[10]  Xin Xia,et al.  Practitioners' Expectations on Automated Code Comment Generation , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[11]  Ming Zhou,et al.  UniXcoder: Unified Cross-Modal Pre-training for Code Representation , 2022, ACL.

[12]  Cuiyun Gao,et al.  Source Code Summarization with Structural Relative Position Guided Transformer , 2022, 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

[13]  Philip S. Yu,et al.  Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention , 2022, IEEE Transactions on Software Engineering.

[14]  Hongyu Zhang,et al.  On the Evaluation of Neural Code Summarization , 2021, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[15]  Zhi Jin,et al.  EditSum: A Retrieve-and-Edit Framework for Source Code Summarization , 2021, 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Yue Wang,et al.  CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , 2021, EMNLP.

[17]  Dongmei Zhang,et al.  CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network , 2021, ArXiv.

[18]  Aakash Bansal,et al.  Project-Level Encoding for Neural Source Code Summarization of Subroutines , 2021, 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC).

[19]  Rishab Sharma,et al.  API2Com: On the Improvement of Automatically Generated Code Comments Using API Documentations , 2021, 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC).

[20]  Chen Lin,et al.  Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting , 2021, 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC).

[21]  J. Keung,et al.  A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts , 2021, 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC).

[22]  David Lo,et al.  Why My Code Summarization Model Does Not Work , 2021, ACM Trans. Softw. Eng. Methodol..

[23]  Neel Sundaresan,et al.  CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.

[24]  Hai Zhao,et al.  Code Summarization with Structure-induced Transformer , 2020, FINDINGS.

[25]  Junjie Chen,et al.  Neural Code Summarization: How Far Are We? , 2021, ArXiv.

[26]  Zhou Yu,et al.  Code to Comment “Translation”: Data, Metrics, Baselining & Evaluation , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27]  Hailong Sun,et al.  Retrieval-based Neural Source Code Summarization , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[28]  Weifeng Zhang,et al.  CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[29]  Xiaofei Xie,et al.  Automatic Code Summarization via Multi-dimensional Semantic Fusing in GNN , 2020, ArXiv.

[30]  Baishakhi Ray,et al.  A Transformer-based Approach for Source Code Summarization , 2020, ACL.

[31]  Collin McMillan,et al.  Improved Automatic Summarization of Subroutines via Attention to File Context , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[32]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[33]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[34]  Bolin Wei,et al.  Retrieve and Refine: Exemplar-Based Neural Comment Generation , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[35]  Marc Brockschmidt,et al.  CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , 2019, ArXiv.

[36]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[37]  Akihiro Yamamoto,et al.  Automatic Source Code Summarization with Extended Tree-LSTM , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[38]  David Lo,et al.  Deep code comment generation with hybrid lexical and syntactical information , 2019, Empirical Software Engineering.

[39]  Collin McMillan,et al.  A Neural Model for Generating Natural Language Summaries of Program Subroutines , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[40]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[41]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[42]  Philip S. Yu,et al.  Improving Automatic Source Code Summarization via Deep Reinforcement Learning , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Shuai Lu,et al.  Summarizing Source Code with Transferred API Knowledge , 2018, IJCAI.

[44]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[45]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[48]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[49]  Joachim Bingel,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics , 2016 .

[50]  Collin McMillan,et al.  Automatic documentation generation via source code summarization of method context , 2014, ICPC 2014.

[51]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[52]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[53]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[54]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[55]  Andrian Marcus,et al.  Supporting program comprehension with source code summarization , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[56]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[57]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[58]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[59]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[60]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[61]  Carl S. Hartzman,et al.  Maintenance productivity: observations based on an experience in a large system environment , 1993, CASCON.

[62]  Ted Tenny,et al.  Program Readability: Procedures Versus Comments , 1988, IEEE Trans. Software Eng..

[63]  Scott N. Woodfield,et al.  The effect of modularization and comments on program comprehension , 1981, ICSE '81.