Automated Identifier Completion and Replacement

Various studies indicate that having concise and consistent identifiers improves the quality of the source code and hence impacts positively source code understanding and maintenance. In order to write concise and consistent identifiers, however, developers need to have some knowledge about the concepts captured in the source code and how they are named. Acquiring such knowledge from the source code might be feasible only for small systems, while it is not viable for large systems. In this paper, we propose an automated approach which exploits concepts and relations automatically extracted from the source code to suggest identifiers. The suggestion is ranked based on the context in which a new identifier is introduced and it can be used either to complete the identifier being written or to replace it with a more appropriate one. To validate the proposed approach, we have conducted a case study by simulating the activities of a developer in naming identifiers. The results of the study show that in the majority of the cases our approach provides completion suggestions which match the identifiers actually used by the developers.

[1]  Paolo Tonella,et al.  Lexicon Bad Smells in Software , 2009, 2009 16th Working Conference on Reverse Engineering.

[2]  Markus Pizka,et al.  Concise and Consistent Naming , 2005, IWPC.

[3]  David Binkley,et al.  Extracting Meaning from Abbreviated Identifiers , 2007 .

[4]  Michael Uschold,et al.  Ontologies and semantics for seamless connectivity , 2004, SGMD.

[5]  Florian Deißenböck,et al.  How Programs Represent Reality (and how they don't) , 2006, 2006 13th Working Conference on Reverse Engineering.

[6]  David W. Binkley,et al.  Syntactic Identifier Conciseness and Consistency , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[7]  Neil C. Rowe,et al.  Enhancing Maintainability of Source Programs Through Disabbreviation , 1997, J. Syst. Softw..

[8]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[9]  Yann-Gaël Guéhéneuc,et al.  Mining the Lexicon Used by Programmers during Sofware Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.

[10]  Einar W. Høst,et al.  Debugging Method Names , 2009, ECOOP.

[11]  Paolo Tonella,et al.  Natural Language Parsing of Program Element Names for Concept Extraction , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[12]  Nicolas Anquetil,et al.  Assessing the relevance of identifier names in a legacy software system , 1998, CASCON.

[13]  Emily Hill,et al.  AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools , 2008, MSR '08.

[14]  David W. Binkley,et al.  Normalizing Source Code Vocabulary , 2010, 2010 17th Working Conference on Reverse Engineering.

[15]  David W. Binkley,et al.  Extracting Meaning from Abbreviated Identifiers , 2007, Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007).

[16]  Sergio Di Martino,et al.  LINSEN: An efficient approach to split identifiers and expand abbreviations , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[17]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[18]  Paolo Tonella,et al.  Restructuring program identifier names , 2000, Proceedings 2000 International Conference on Software Maintenance.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.