Automatically generating natural language descriptions for object-related statement sequences

Current source code analyses driving software maintenance tools treat methods as either a single unit or a set of individual statements or words. They often leverage method names and any existing internal comments. However, internal comments are rare, and method names do not typically capture the method's multiple high-level algorithmic steps that are too small to be a single method, but require more than one statement to implement. Previous work demonstrated feasibility of identifying high level actions automatically for loops; however, many high level actions remain unaddressed and undocumented, particularly sequences of consecutive statements that are associated with each other primarily by object references. We call these object-related action units. In this paper, we present an approach to automatically generate natural language descriptions of object-related action units within methods. We leverage the available, large source of high-quality open source projects to learn the templates of object-related actions, identify the statement that can represent the main action, and generate natural language descriptions for these actions. Our evaluation study of a set of 100 object-related statement sequences showed promise of our approach to automatically identify the action and arguments and generate natural language descriptions.

[1]  Boyang Li,et al.  Automatically Documenting Unit Test Cases , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[2]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[3]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[4]  Xiaoran Wang,et al.  Automatic Segmentation of Method Code into Meaningful Blocks: Design and Evaluation , 2014, J. Softw. Evol. Process..

[5]  Lori L. Pollock,et al.  Generating Parameter Comments and Integrating with Method Summaries , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[6]  Collin McMillan,et al.  Automatic Source Code Summarization of Context for Java Methods , 2016, IEEE Transactions on Software Engineering.

[7]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[8]  Andreas Zeller,et al.  Detecting object usage anomalies , 2007, ESEC-FSE '07.

[9]  Charles A. Sutton,et al.  Mining idioms from source code , 2014, SIGSOFT FSE.

[10]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[11]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[12]  Katsuhiko Gondow,et al.  Toward mining "concept keywords" from identifiers in large software projects , 2005, MSR.

[13]  Lori Pollock,et al.  Integrating natural language and program structure information to improve software search and exploration , 2010 .

[14]  Xiaoran Wang,et al.  Developing a model of loop actions by mining loop characteristics from a large code corpus , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[15]  Jeffrey C. Carver,et al.  Part-of-speech tagging of program identifiers for improved text-based software engineering tools , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[16]  Emily Hill,et al.  Mining source code to automatically split identifiers for software analysis , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[17]  Emily Hill,et al.  AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools , 2008, MSR '08.

[18]  Tomoki Toda,et al.  Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Xiaoran Wang,et al.  Automatic Segmentation of Method Code into Meaningful Blocks to Improve Readability , 2011, 2011 18th Working Conference on Reverse Engineering.

[20]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[21]  Gerardo Canfora,et al.  Mining source code descriptions from developer communications , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[22]  Andrew Begel,et al.  Cognitive Perspectives on the Role of Naming in Computer Programs , 2006, PPIG.

[23]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[24]  Martin P. Robillard,et al.  Code fragment summarization , 2013, ESEC/FSE 2013.

[25]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[27]  Chanchal Kumar Roy,et al.  Recommending insightful comments for source code using crowdsourced knowledge , 2015, 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[28]  Westley Weimer,et al.  Automatically documenting program changes , 2010, ASE.

[29]  Andrian Marcus,et al.  JStereoCode: automatically identifying method and class stereotypes in Java code , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[30]  Gail C. Murphy,et al.  Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.

[31]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[32]  Westley Weimer,et al.  Synthesizing API usage examples , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[33]  Koushik Sen,et al.  CodeHint: dynamic and interactive synthesis of code snippets , 2014, ICSE.

[34]  Christopher D. Hundhausen,et al.  On the design of an educational infrastructure for the blind and visually impaired in computer science , 2011, SIGCSE.