Automatic Code Summarization: A Systematic Literature Review

Background: During software maintenance and development, the comprehension of program code is key to success. High-quality comments can help us better understand programs, but they're often missing or outmoded in today's programs. Automatic code summarization is proposed to solve these problems. During the last decade, huge progress has been made in this field, but there is a lack of an up-to-date survey. Aims: We studied publications concerning code summarization in the field of program comprehension to investigate state-of-the-art approaches. By reading and analyzing relevant articles, we aim at obtaining a comprehensive understanding of the current status of automatic code summarization. Method: In this paper, we performed a systematic literature review over the automatic source code summarization field. Furthermore, we synthesized the obtained data and investigated different approaches. Results: We successfully collected and analyzed 41 selected studies from the different research communities. We exhaustively investigated and described the data extraction techniques, description generation methods, evaluation methods and relevant artifacts of those works. Conclusions: Our systematic review provides an overview of the state of the art, and we also discuss further research directions. By fully elaborating current approaches in the field, our work sheds light on future research directions of program comprehension and comment generation.

[1]  Gail C. Murphy,et al.  Why did this code change? , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[2]  Mario Linares Vásquez,et al.  On Automatically Generating Commit Messages via Summarization of Source Code Changes , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[3]  Tomoki Toda,et al.  Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[5]  Khaironi Yatim Sharif,et al.  Source code analysis extractive approach to generate textual summary , 2017 .

[6]  William W. Cohen,et al.  Natural Language Models for Predicting Programming Comments , 2013, ACL.

[7]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[8]  Xiaonan Luo,et al.  Mining Version Control System for Automatically Generating Commit Comment , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[9]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[10]  Ming Li,et al.  Code Attention: Translating Code to Comments by Exploiting Domain Features , 2017, ArXiv.

[11]  Collin McMillan,et al.  Towards Automatic Generation of Short Summaries of Commits , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[12]  Jeffrey C. Carver,et al.  Evaluating source code summarization techniques: Replication and expansion , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[13]  Bin Li,et al.  On Automatic Summarization of What and Why Information in Source Code Changes , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[14]  He Jiang,et al.  Summarizing Software Artifacts: A Literature Review , 2016, Journal of Computer Science and Technology.

[15]  Yutaka Matsuo,et al.  A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes , 2017, ACL.

[16]  Mirella Lapata,et al.  Autofolding for Source Code Summarization , 2014, IEEE Transactions on Software Engineering.

[17]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[18]  Boyang Li,et al.  Automatically Documenting Unit Test Cases , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[19]  Jonathan I. Maletic,et al.  Using stereotypes in the automatic generation of natural language summaries for C++ methods , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[20]  Mrinaal Malhotra,et al.  Class Level Code Summarization Based on Dependencies and Micro Patterns , 2018, 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT).

[21]  Lori L. Pollock,et al.  Generating Parameter Comments and Integrating with Method Summaries , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[22]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[23]  Michele Lanza,et al.  Summarizing Complex Development Artifacts by Mining Heterogeneous Data , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[24]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[25]  Manabu Kamimura,et al.  Towards generating human-oriented summaries of unit test cases , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[26]  Kenny Q. Zhu,et al.  Automatic Generation of Text Descriptive Comments for Code Blocks , 2018, AAAI.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Gabriele Bavota,et al.  Automatic generation of release notes , 2014, SIGSOFT FSE.

[29]  Tao Zhang,et al.  Source code fragment summarization with small-scale crowdsourcing based features , 2015, Frontiers of Computer Science.

[30]  Abbas Heydarnoori,et al.  CrowdSummarizer: Automated Generation of Code Summaries for Java Programs through Crowdsourcing , 2017, IEEE Software.

[31]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[32]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[33]  Zhenchang Xing,et al.  Measuring Program Comprehension: A Large-Scale Field Study with Professionals , 2018, IEEE Transactions on Software Engineering.

[34]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[35]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[37]  Xiaoran Wang,et al.  Automatically generating natural language descriptions for object-related statement sequences , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[38]  Lin Tan,et al.  CloCom: Mining existing source code for automatic comment generation , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[39]  Collin McMillan,et al.  Improving topic model source code summarization , 2014, ICPC 2014.

[40]  Collin McMillan,et al.  Automatic documentation generation via source code summarization of method context , 2014, ICPC 2014.

[41]  Andrian Marcus,et al.  Supporting program comprehension with source code summarization , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[42]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[43]  Shuai Lu,et al.  Summarizing Source Code with Transferred API Knowledge , 2018, IJCAI.

[44]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[45]  Atul Gupta,et al.  Method Level Text Summarization for Java Code Using Nano-Patterns , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[46]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[47]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[48]  Westley Weimer,et al.  Automatically documenting program changes , 2010, ASE.

[49]  Sarah Rastkar,et al.  Summarizing software concerns , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[50]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[51]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[52]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[53]  Claes Wohlin,et al.  Guidelines for snowballing in systematic literature studies and a replication in software engineering , 2014, EASE '14.

[54]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[55]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).