Developing a model of loop actions by mining loop characteristics from a large code corpus

Some high level algorithmic steps require more than one statement to implement, but are not large enough to be a method on their own. Specifically, many algorithmic steps (e.g., count, compare pairs of elements, find the maximum) are implemented as loop structures, which lack the higher level abstraction of the action being performed, and can negatively affect both human readers and automatic tools. Additionally, in a study of 14,317 projects, we found that less than 20% of loops are documented to help readers. In this paper, we present a novel automatic approach to identify the high level action implemented by a given loop. We leverage the available, large source of high-quality open source projects to mine loop characteristics and develop an action identification model. We use the model and feature vectors extracted from loop code to automatically identify the high level actions implemented by loops. We have evaluated the accuracy of the loop action identification and coverage of the model over 7159 open source programs. The results show great promise for this approach to automatically insert internal comments and provide additional higher level naming for loop actions to be used by tools such as code search.

[1]  Letha H. Etzkorn,et al.  The language of comments in computer software: A sublanguage of English , 2001 .

[2]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[4]  Yuanyuan Zhou,et al.  Listening to programmers — Taxonomies and characteristics of comments in operating system code , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[5]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[6]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[7]  Michael W. Godfrey,et al.  Supporting the analysis of clones in software systems: Research Articles , 2006 .

[8]  Katsuhiko Gondow,et al.  Toward mining "concept keywords" from identifiers in large software projects , 2005, MSR.

[9]  Ted Tenny,et al.  Program Readability: Procedures Versus Comments , 1988, IEEE Trans. Software Eng..

[10]  Mira Kajko-Mattsson,et al.  The state of documentation practice within corrective maintenance , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[11]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[12]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[13]  Koushik Sen,et al.  CodeHint: dynamic and interactive synthesis of code snippets , 2014, ICSE.

[14]  Pierre N. Robillard Automating comments , 1989, SIGP.

[15]  Christopher D. Hundhausen,et al.  On the design of an educational infrastructure for the blind and visually impaired in computer science , 2011, SIGCSE.

[16]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[17]  Westley Weimer,et al.  Synthesizing API usage examples , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[18]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[19]  Robert D. Macredie,et al.  The effects of comments and identifier names on program comprehensibility: an experimental investigation , 1996, J. Program. Lang..

[20]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[21]  Michael W. Godfrey,et al.  Supporting the analysis of clones in software systems , 2006, J. Softw. Maintenance Res. Pract..

[22]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[23]  Lori L. Pollock,et al.  Automatically mining software-based, semantically-similar words from comment-code mappings , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[24]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[25]  Andreas Zeller,et al.  Detecting object usage anomalies , 2007, ESEC-FSE '07.

[26]  Charles A. Sutton,et al.  Mining idioms from source code , 2014, SIGSOFT FSE.

[27]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[28]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[29]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[30]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[31]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[32]  Xiaoran Wang,et al.  Automatic Segmentation of Method Code into Meaningful Blocks to Improve Readability , 2011, 2011 18th Working Conference on Reverse Engineering.

[33]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[34]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[35]  Steve McConnell,et al.  Code Complete, Second Edition , 2004 .

[36]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).