Comment Generation for Source Code: State of the Art, Challenges and Opportunities

Researches have shown that most effort of today's software development is maintenance and evolution. Developers often use integrated development environments, debuggers, and tools for code search, testing, and program understanding to reduce the tedious tasks. One way to make software development more efficient is to make the program more readable. There have been many approaches proposed and developed for this purpose. Among these approaches, comment generation for source code is gaining more and more attention and has become a popular research area. In this paper, the state of art in comment generation research area are summarized and the challenges and future opportunities are discussed.

[1]  Yijun Yu,et al.  Improving feature location using structural similarity and iterative graph mapping , 2013, J. Syst. Softw..

[2]  Martin P. Robillard,et al.  The Emergent Structure of Development Tasks , 2005, ECOOP.

[3]  Gail E. Kaiser,et al.  An Information Retrieval Approach For Automatically Constructing Software Libraries , 1991, IEEE Trans. Software Eng..

[4]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[5]  Andrian Marcus,et al.  Supporting program comprehension with source code summarization , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[6]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[7]  Nenghai Yu,et al.  WWW 2009 MADRID! Track: Rich Media / Session: Tagging and Clustering Learning to , 2022 .

[8]  Yann-Gaël Guéhéneuc,et al.  Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[9]  Nicholas A. Kraft,et al.  What information about code snippets is available in different software-related documents? An exploratory study , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[10]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[11]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[12]  Coskun Bayrak,et al.  Categorization of Users Using Unlabeled Query Logs , 2008, DMIN.

[13]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[14]  L. Erlikh,et al.  Leveraging legacy system dollars for e-business , 2000 .

[15]  Emily Hill,et al.  Using natural language program analysis to locate and understand action-oriented concerns , 2007, AOSD.

[16]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[17]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[18]  Katsuhiko Gondow,et al.  Toward mining "concept keywords" from identifiers in large software projects , 2005, MSR.

[19]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[20]  Scott R. Tilley 15 Years of Program Comprehension , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[21]  Emily Hill,et al.  Mining source code to automatically split identifiers for software analysis , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[22]  Margaret-Anne D. Storey,et al.  Theories, Methods and Tools in Program Comprehension: Past, Present and Future , 2005, IWPC.

[23]  David Lo,et al.  Automated construction of a software-specific word similarity database , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[24]  David W. Binkley,et al.  Expanding identifiers to normalize source code vocabulary , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[25]  Wei Zhao,et al.  SNIAFL: towards a static non-interactive approach to feature location , 2004, Proceedings. 26th International Conference on Software Engineering.

[26]  Xiaoran Wang,et al.  Automatic Segmentation of Method Code into Meaningful Blocks: Design and Evaluation , 2014, J. Softw. Evol. Process..

[27]  Zhenchang Xing,et al.  Improving feature location practice with multi-faceted interactive exploration , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[28]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[29]  Lori L. Pollock,et al.  Generating Parameter Comments and Integrating with Method Summaries , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[30]  Clémentine Nebut,et al.  Automatic Extraction of a WordNet-Like Identifier Network from Software , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[31]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[32]  Václav Rajlich,et al.  Concept location using program dependencies and information retrieval (DepIR) , 2013, Inf. Softw. Technol..

[33]  Adele Goldberg,et al.  Programmer as Reader , 1987, IEEE Software.

[34]  E. Burton Swanson,et al.  Characteristics of application software maintenance , 1978, CACM.

[35]  Emily Hill,et al.  Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[36]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[37]  K. K. Aggarwal,et al.  An integrated measure of software maintainability , 2002, Annual Reliability and Maintainability Symposium. 2002 Proceedings (Cat. No.02CH37318).

[38]  Darrell R. Raymond,et al.  Reading source code , 1991, CASCON.

[39]  Spencer Rugaber,et al.  The use of domain knowledge in program understanding , 2000, Ann. Softw. Eng..

[40]  Martin P. Robillard,et al.  Code fragment summarization , 2013, ESEC/FSE 2013.

[41]  David W. Binkley,et al.  Improving identifier informativeness using part of speech information , 2011, MSR '11.

[42]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[43]  Lori L. Pollock,et al.  Extracting Code Segments and Their Descriptions from Research Articles , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[44]  Xiaoran Wang,et al.  Automatic Segmentation of Method Code into Meaningful Blocks to Improve Readability , 2011, 2011 18th Working Conference on Reverse Engineering.

[45]  Xiaoran Wang,et al.  Automatically generating natural language descriptions for object-related statement sequences , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[46]  Václav Rajlich,et al.  Incremental change in object-oriented programming , 2004, IEEE Software.

[47]  Lionel E. Deimel The uses of program reading , 1985, SGCS.

[48]  Kwan-Liu Ma,et al.  Stable, flexible, peephole pretty-printing , 2008, Sci. Comput. Program..

[49]  Ted Pedersen,et al.  Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[50]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[51]  Xiaoran Wang,et al.  Exploring action unit granularity of source code for supporting software maintenance , 2017 .