Automated Comment Update: How Far are We?

Code comments are key to program comprehension. When they are not consistent with the code, maintenance is hindered. Yet developers often forget to update comments along with their code evolution. With recent advances in neural machine translation, the research community is contemplating novel approaches for automatically generating up-to-date comments following code changes. CUP is such an example state-of-the-art approach whose promising performance remains however to be comprehensively assessed. Our study contributes to the literature by performing an in-depth analysis on the effectiveness of CUP. Our analysis revealed that the overall effectiveness of CUP is largely contributed by its success on updating comments via a single token change (96.6%). Several update failures occur when CUP ignores some code change information (10.4%) or when it is otherwise misled by additional information (12.8%). To put in perspective the achievements of CUP, we implement HEBCUP, a straightforward heuristic-based approach for code comment update. Building on our observations on CUP successful and failure cases, we design heuristics for focusing the update on the changed code and for performing token-level comment update. HebCup is shown to outperform CUP in terms of Accuracy by more than 60% while being over three orders of magnitude (i.e., 1700 times) faster. Further empirical analysis confirms that the HebCup does not even overfit to the empirical analysis set. Overall, with this study, we call for more research in deep learning based comment update towards achieving state-of-the-art performance that would be unreachable by other less sophisticated techniques.

[1]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[2]  Elmar Jürgens,et al.  Quality analysis of source code comments , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[3]  Son Nguyen,et al.  Suggesting Natural Method Names to Check Name Consistencies , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[4]  Ding Yuan,et al.  HotComments: How to Make Program Comments More Useful? , 2007, HotOS.

[5]  Mário André de Freitas Farias,et al.  Identifying self-admitted technical debt through code comment analysis with a contextualized vocabulary , 2020, Inf. Softw. Technol..

[6]  Yue Wang,et al.  Code Completion with Neural Attention and Pointer Networks , 2017, IJCAI.

[7]  Alberto Bacchelli,et al.  Classifying code comments in Java software systems , 2019, Empirical Software Engineering.

[8]  Jianjun He,et al.  Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks , 2020, 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC).

[9]  Zachary Eberhart,et al.  Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments , 2019, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[10]  Harvey P. Siy,et al.  Does the modern code inspection have value? , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[11]  Scott N. Woodfield,et al.  The effect of modularization and comments on program comprehension , 1981, ICSE '81.

[12]  Yuanyuan Zhou,et al.  Listening to programmers — Taxonomies and characteristics of comments in operating system code , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[13]  David Lo,et al.  CC2Vec: Distributed Representations of Code Changes , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[14]  He Jiang,et al.  Machine Learning Based Recommendation of Method Names: How Far are We , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  Gabriele Bavota,et al.  A Large-Scale Empirical Study on Code-Comment Inconsistencies , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[16]  Zhenchang Xing,et al.  Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[17]  Yuanyuan Zhou,et al.  aComment: mining annotations from comments and code to detect interrupt related concurrency bugs , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[18]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  David Lo,et al.  Deep code comment generation with hybrid lexical and syntactical information , 2019, Empirical Software Engineering.

[20]  Collin McMillan,et al.  A Neural Model for Generating Natural Language Summaries of Program Subroutines , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[21]  Tim Menzies,et al.  Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[22]  Ted Tenny,et al.  Program Readability: Procedures Versus Comments , 1988, IEEE Trans. Software Eng..

[23]  Meng Yan,et al.  Automating Just-In-Time Comment Updating , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Andrea De Lucia,et al.  Comparing Heuristic and Machine Learning Approaches for Metric-Based Code Smell Detection , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[25]  Fang Liu,et al.  A Self-Attentional Neural Architecture for Code Completion with Multi-Task learning , 2019, 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC).

[26]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[27]  Martin P. Robillard,et al.  Detecting fragile comments , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[28]  Houari A. Sahraoui,et al.  How Good is Your Comment? A Study of Comments in Java Programs , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[29]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[30]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[31]  James L. Wright,et al.  Source code that talks: an exploration of Eclipse task comments and their implication to repository mining , 2005, MSR '05.

[32]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[33]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[34]  Yu Zhou,et al.  Analyzing APIs Documentation and Code to Detect Directive Defects , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[35]  Collin McMillan,et al.  Improved Code Summarization via a Graph Neural Network , 2020, 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC).

[36]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[37]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[38]  Gary T. Leavens,et al.  @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[39]  Ahmed E. Hassan,et al.  On the relationship between comment update practices and Software Bugs , 2012, J. Syst. Softw..

[40]  K. M. Annervaz,et al.  Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).