An empirical study of rules for well-formed identifiers: Research Articles

Readers of programs have two main sources of domain information: identifier names and comments. In order to efficiently maintain source code, it is important that the identifier names (as well as comments) communicate clearly the concepts they represent. Deisenbock and Pizka recently introduced two rules for creating well-formed identifiers: one considers the consistency of identifiers and the other their conciseness. These rules require a mapping from identifiers to the concepts they represent, which may be costly to develop after the initial release of a system. An approach for verifying whether identifiers are well formed without any additional information (e.g., a concept mapping) is developed. Using a pool of 48 million lines of code, experiments with the resulting syntactic rules for well-formed identifiers illustrate that violations of the syntactic pattern exist. Two case studies show that three-quarters of these violations are ‘real’. That is, they could be identified using a concept mapping. Three related studies show that programmers tend to use a rather limited vocabulary, that, contrary to many other aspects of system evolution, maintenance does not introduce additional rule violations, and that open and proprietary sources differ in their percentage of violations. Copyright © 2007 John Wiley & Sons, Ltd.

[1]  Paolo Tonella,et al.  Restructuring program identifier names , 2000, Proceedings 2000 International Conference on Software Maintenance.

[2]  Christopher H. Morrell,et al.  Linear Transformations of Linear Mixed-Effects Models , 1997 .

[3]  G. Molenberghs,et al.  Linear Mixed Models for Longitudinal Data , 2001 .

[4]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[5]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[6]  Gail E. Kaiser,et al.  An Information Retrieval Approach For Automatically Constructing Software Libraries , 1991, IEEE Trans. Software Eng..

[7]  David W. Binkley,et al.  What’s in a Name? A Study of Identifiers , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[8]  Nicolas Anquetil,et al.  Assessing the relevance of identifier names in a legacy software system , 1998, CASCON.

[9]  Beszé Columbus - Reverse Engineering Tool and Schema for C++ , 2002 .

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[12]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[13]  Paolo Tonella,et al.  Nomen est omen: analyzing the language of function identifiers , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[14]  Akif Günes Koru,et al.  Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products , 2005, IEEE Transactions on Software Engineering.

[15]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[16]  Harry M. Sneed Object-oriented COBOL recycling , 1996, Proceedings of WCRE '96: 4rd Working Conference on Reverse Engineering.

[17]  Katsuro Inoue,et al.  Automatic categorization algorithm for evolvable software archive , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[18]  Robert D. Macredie,et al.  The effects of comments and identifier names on program comprehensibility: an experimental investigation , 1996, J. Program. Lang..

[19]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[20]  Markus Pizka,et al.  Concise and Consistent Naming , 2005, IWPC.

[21]  Juergen Rilling,et al.  Identifying comprehension bottlenecks using program slicing and cognitive complexity metrics , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..