Automatic generation of descriptive summary comments for methods in object-oriented programs

A software system is typically developed in a number of different phases viz., analysis, design, coding and testing. Once the software is released to the customer, it enters a phase known as maintenance. It is well established that software maintenance takes an inordinate amount of the overall resources spent on a software project during its life cycle. It is estimated that 60 to 90% of the overall costs of software development are due to maintenance. An important reason for the surprisingly exorbitant costs of maintenance is the difficulty associated with understanding the software. Software must be understood sufficiently to perform a required maintenance task correctly. Several studies have suggested that comments describing the code can help mitigate the burden of program understanding. However, studies also suggest that there is a dearth of comments in software systems. Even in systems with many comments, the comments tend to be obsolete with respect to the code, thus rendering them not only useless but also potentially dangerous. This dissertation addresses the issue of a dearth of comments, by automatically generating comments. Such generated comments can also ensure that developers can avoid the tedious task of updating comments, which they often forget to do in the rush to complete a maintenance task. Thus, automatically generating comments can also decrease the number of comments that are not up to date with the code. An underlying hypothesis of this research project is that succinct natural language descriptions of source code fragments, presented in the form of comments, can reduce the amount of code to be read by the developer. This in turn decreases the time required to understand the code. By relying upon the generated succinct descriptions, a developer can quickly filter out code that is not germane to the current maintenance task and focus all his attention on the relevant code. In particular, this dissertation focuses on automatically: (1) generating comments that summarize a given Java method, (2) identifying groupings of statements within a method that collectively implement a high-level action and generating a succinct description of the high-level action, and (3) generating comments that provide a high-level overview of a parameter's role in a method.

[1]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Letha H. Etzkorn,et al.  The language of comments in computer software: A sublanguage of English , 2001 .

[4]  A. Rountev,et al.  Object naming analysis for reverse-engineered sequence diagrams , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[5]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[6]  Jonathan I. Maletic,et al.  Reverse Engineering Method Stereotypes , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[7]  Yang Cai,et al.  Api hyperlinking via structural overlap , 2009, ESEC/SIGSOFT FSE.

[8]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[9]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[10]  Ruven E. Brooks,et al.  Towards a Theory of the Comprehension of Computer Programs , 1983, Int. J. Man Mach. Stud..

[11]  Einar W. Høst,et al.  The Programmer's Lexicon, Volume I: The Verbs , 2007 .

[12]  Premkumar T. Devanbu,et al.  LaSSIE—a knowledge-based software information system , 1991, ICSE '90.

[13]  Itay Maman,et al.  Micro patterns in Java code , 2005, OOPSLA '05.

[14]  Oscar Nierstrasz,et al.  Object-oriented reengineering patterns , 2004, Proceedings. 26th International Conference on Software Engineering.

[15]  Mark Harman,et al.  Code extraction algorithms which unify slicing and concept assignment , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[16]  David W. Binkley,et al.  Leveraged Quality Assessment using Information Retrieval Techniques , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[17]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[18]  Brian W. Kernighan,et al.  Elements of Programming Style , 1974 .

[19]  Rajiv Gupta,et al.  Fault localization using value replacement , 2008, ISSTA '08.

[20]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[21]  James R. Larus,et al.  Branch prediction for free , 1993, PLDI '93.

[22]  Kent L. Beck,et al.  Extreme programming explained - embrace change , 1990 .

[23]  Lori Pollock,et al.  Integrating natural language and program structure information to improve software search and exploration , 2010 .

[24]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[25]  Scott R. Tilley 15 Years of Program Comprehension , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[26]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[27]  Raymond P. L. Buse,et al.  A metric for software readability , 2008, ISSTA '08.

[28]  Hal Berghel,et al.  An interactive source commenter for Prolog programs , 1990, SIGDOC '90.

[29]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[30]  Mark Harman,et al.  Unifying program slicing and concept assignment for higher-level executable source code extraction: Research Articles , 2005 .

[31]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[32]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[33]  Einar W. Høst,et al.  Debugging Method Names , 2009, ECOOP.

[34]  Ahmed E. Hassan,et al.  Examining the evolution of code comments in PostgreSQL , 2006, MSR '06.

[35]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[36]  Westley Weimer,et al.  Automatically documenting program changes , 2010, ASE.

[37]  Lionel C. Briand,et al.  Investigating quality factors in object-oriented designs: an industrial case study , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[38]  Bjarne Stroustrup,et al.  The C++ Programming Language, 4th Edition , 2013 .

[39]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[40]  Ted Tenny,et al.  Program Readability: Procedures Versus Comments , 1988, IEEE Trans. Software Eng..

[41]  Westley Weimer,et al.  Automatic documentation inference for exceptions , 2008, ISSTA '08.

[42]  Emily Hill,et al.  Mining source code to automatically split identifiers for software analysis , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[43]  Emily Hill,et al.  Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[44]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[45]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[46]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[47]  Scott W. Ambler,et al.  The Elements of Java Style , 2000 .

[48]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[49]  E. Nurvitadhi,et al.  Do class comments aid Java program understanding? , 2003, 33rd Annual Frontiers in Education, 2003. FIE 2003..

[50]  Steve McConnell,et al.  Code Complete, Second Edition , 2004 .

[51]  Jean Scholtz,et al.  The Roles Beacons Play in Comprehension for Novice and Expert Programmers , 2002, PPIG.

[52]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[53]  Martin P. Robillard,et al.  The Emergent Structure of Development Tasks , 2005, ECOOP.

[54]  Pierre N. Robillard Automating comments , 1989, SIGP.

[55]  Atanas Rountev,et al.  Interactive Exploration of UML Sequence Diagrams , 2005, 3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[56]  Pierre N. Robillard,et al.  Schematic pseudocode for program constructs and its computer automation by SCHEMACODE , 1986, CACM.

[57]  Timothy E. Erickson An automated FORTRAN documenter , 1982, SIGDOC '82.

[58]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[59]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[60]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[61]  Lori L. Pollock,et al.  Generating Parameter Comments and Integrating with Method Summaries , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[62]  Emily Hill,et al.  Exploring the neighborhood with dora to expedite software maintenance , 2007, ASE '07.

[63]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[64]  Janice Singer,et al.  TODO or to bug , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[65]  James D. Herbsleb,et al.  Improving API documentation usability with knowledge pushing , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[66]  Daniel M. German,et al.  Execution , 2003, Suing Foreign Governments and Their Corporations, 2nd Edition.

[67]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[68]  Andrew Begel,et al.  Cognitive Perspectives on the Role of Naming in Computer Programs , 2006, PPIG.

[69]  Ted Pedersen,et al.  Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[70]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[71]  Ben Shneiderman,et al.  Syntactic/semantic interactions in programmer behavior: A model and experimental results , 1979, International Journal of Computer & Information Sciences.

[72]  A. von Mayrhauser,et al.  From code understanding needs to reverse engineering tool capabilities , 1993, Proceedings of 6th International Workshop on Computer-Aided Software Engineering.

[73]  Michael J. Kaelbling Programming languages should NOT have comment statements , 1988, SIGP.

[74]  Thomas W. Reps,et al.  The use of program dependence graphs in software engineering , 1992, International Conference on Software Engineering.

[75]  Mira Kajko-Mattsson,et al.  A Survey of Documentation Practice within Corrective Maintenance , 2004, Empirical Software Engineering.

[76]  Jean-François Rouet,et al.  Documentation skills in novice and expert programmers: an empirical comparison , 1995, PPIG.

[77]  Robert D. Macredie,et al.  The effects of comments and identifier names on program comprehensibility: an experimental investigation , 1996, J. Program. Lang..

[78]  Emily Hill,et al.  AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools , 2008, MSR '08.

[79]  David W. Binkley,et al.  Impact of Limited Memory Resources , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[80]  L. Erlikh,et al.  Leveraging legacy system dollars for e-business , 2000 .

[81]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[82]  Yuanyuan Zhou,et al.  Listening to programmers — Taxonomies and characteristics of comments in operating system code , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[83]  Adele Goldberg,et al.  Programmer as Reader , 1987, IEEE Software.

[84]  David G. Novick,et al.  What users say they want in documentation , 2006, SIGDOC '06.

[85]  Thomas Zimmermann,et al.  How documentation evolves over time , 2007, IWPSE '07.

[86]  Larry Weissman,et al.  Psychological complexity of computer programs: an experimental methodology , 1974, SIGP.

[87]  Margaret-Anne D. Storey,et al.  Theories, Methods and Tools in Program Comprehension: Past, Present and Future , 2005, IWPC.

[88]  Mark Harman,et al.  Evaluating Key Statements Analysis , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[89]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.