Natural Language Processing and Program Analysis for Supporting Todo Comments as Software Evolves.

Natural language elements (e.g., API comments, todo comments) form a substantial part of software repositories. While developers routinely use many natural language elements (e.g., todo comments) for communication, the semantic content of these elements is often neglected by software engineering techniques and tools. Additionally, as software evolves and development teams re-organize, these natural language elements are frequently forgotten, or just become outdated, imprecise and irrelevant. We envision several techniques, which combine natural language processing and program analysis, to help developers maintain their todo comments. Specifically, we propose techniques to synthesize code from comments, make comments executable, answer questions in comments, improve comment quality, and detect dangling comments.

[1]  Raymond J. Mooney,et al.  Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes , 2015, ACL.

[2]  Junyi Jessy Li,et al.  Fast and Accurate Prediction of Sentence Specificity , 2015, AAAI.

[3]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[4]  James R. Cordy,et al.  TXL - A Language for Programming Language Tools and Applications , 2004, LDTA@ETAPS.

[5]  Yuanyuan Zhou,et al.  Have things changed now?: an empirical study of bug characteristics in modern open source software , 2006, ASID '06.

[6]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[7]  Rikard Andersson,et al.  Mining Relations from Git Commit Messages : an Experience Report , 2014 .

[8]  Giriprasad Sridhara Automatically Detecting the Up-To-Date Status of ToDo Comments in Java Programs , 2016, ISEC.

[9]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[10]  Premkumar T. Devanbu,et al.  Recovering clear, natural identifiers from obfuscated JS names , 2017, ESEC/SIGSOFT FSE.

[11]  Janice Singer,et al.  TODO or to bug , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[12]  Raymond J. Mooney,et al.  Dialog for Language to Code , 2017, IJCNLP.

[13]  William W. Cohen,et al.  Natural Language Models for Predicting Programming Comments , 2013, ACL.

[14]  Tomoki Toda,et al.  Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[16]  Armando Solar-Lezama,et al.  Program synthesis from polymorphic refinement types , 2015, PLDI.

[17]  Mario Linares Vásquez,et al.  ChangeScribe: A Tool for Automatically Generating Commit Messages , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[18]  Yutaka Matsuo,et al.  A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes , 2017, ACL.

[19]  Michael D. Ernst Natural Language is a Programming Language: Applying Natural Language Processing to Software Development , 2017, SNAPL.

[20]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[21]  Houari A. Sahraoui,et al.  How Good is Your Comment? A Study of Comments in Java Programs , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[22]  Collin McMillan,et al.  Towards Automatic Generation of Short Summaries of Commits , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[23]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[24]  Giuliano Antoniol,et al.  The Use of Text Retrieval and Natural Language Processing in Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[25]  Ani Nenkova,et al.  Easily Identifiable Discourse Relations , 2008, COLING.

[26]  Yuanyuan Zhou,et al.  Listening to programmers — Taxonomies and characteristics of comments in operating system code , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Tao Xie,et al.  Inferring specifications for resources from natural language API documentation , 2011, Automated Software Engineering.

[28]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.