How Developers Choose Names

The names of variables and functions serve as implicit documentation and are instrumental for program comprehension. But choosing good meaningful names is hard. We perform a sequence of experiments in which a total of 334 subjects are required to choose names in given programming scenarios. The first experiment shows that the probability that two developers would select the same name is low: in the 47 instances in our experiments the median probability was only 6.9%. At the same time, given that a specific name is chosen, it is usually understood by the majority of developers. Analysis of the names given in the experiment suggests a model where naming is a (not necessarily cognizant or serial) three-step process: (1) selecting the concepts to include in the name, (2) choosing the words to represent each concept, and (3) constructing a name using these words. A followup experiment, using the same experimental setup, then checked whether using this model explicitly can improve the quality of names. The results were that names selected by subjects using the model were judged by two independent judges to be superior to names chosen in the original experiment by a ratio of two-to-one. Using the model appears to encourage the use of more concepts and longer names.

[1]  Margaret-Anne D. Storey,et al.  Theories, tools and research methods in program comprehension: past, present and future , 2006, Software Quality Journal.

[2]  Andrian Marcus,et al.  Supporting program comprehension with source code summarization , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Yves Le Traon,et al.  Learning to Spot and Refactor Inconsistent Method Names , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[4]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[5]  Alexander Serebrenik,et al.  How do Scratch Programmers Name Variables and Procedures? , 2017, 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[6]  Yijun Yu,et al.  Exploring the Influence of Identifier Names on Code Quality: An Empirical Study , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[7]  Dror G. Feitelson,et al.  From Repeatability to Reproducibility and Corroboration , 2015, OPSR.

[8]  David W. Binkley,et al.  Identifier length and limited programmer memory , 2009, Sci. Comput. Program..

[9]  Giuseppe Scanniello,et al.  Dealing with identifiers and comments in source code comprehension and maintenance: results from an ethnographically-informed study with students and professionals , 2014, EASE '14.

[10]  Michel R. V. Chaudron,et al.  UML class diagram simplification: what is in the developer's mind? , 2012, EESSMod '12.

[11]  Andrew M. Kuhn,et al.  Code Complete , 2005, Technometrics.

[12]  Michael Beigl,et al.  Descriptive Compound Identifier Names Improve Source Code Comprehension , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[13]  Collin McMillan,et al.  An Eye-Tracking Study of Java Programmers and Application to Source Code Summarization , 2015, IEEE Transactions on Software Engineering.

[14]  Thomas Leich,et al.  Understanding understanding source code with functional magnetic resonance imaging , 2014, ICSE.

[15]  Andrew Begel,et al.  Cognitive Perspectives on the Role of Naming in Computer Programs , 2006, PPIG.

[16]  Janet Siegmund,et al.  Shorter identifier names take longer to comprehend , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[17]  Emily Hill,et al.  An empirical study of identifier splitting techniques , 2014, Empirical Software Engineering.

[18]  Giuliano Antoniol,et al.  Linguistic antipatterns: what they are and how developers perceive them , 2015, Empirical Software Engineering.

[19]  Andy Cockburn,et al.  Program Comprehension: Investigating the Effects of Naming Style and Documentation , 2005, AUIC.

[20]  Claes Wohlin,et al.  Empirical software engineering experts on the use of students and professionals in experiments , 2017, Empirical Software Engineering.

[21]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[22]  David W. Binkley,et al.  Improving identifier informativeness using part of speech information , 2011, MSR '11.

[23]  Paolo Tonella,et al.  Restructuring program identifier names , 2000, Proceedings 2000 International Conference on Software Maintenance.

[24]  Robert D. Macredie,et al.  The effects of comments and identifier names on program comprehensibility: an experimental investigation , 1996, J. Program. Lang..

[25]  Gerard J. Holzmann Code Clarity , 2016, IEEE Software.

[26]  Ruven E. Brooks,et al.  Towards a Theory of the Comprehension of Computer Programs , 1983, Int. J. Man Mach. Stud..

[27]  Curtis R. Cook,et al.  An Investigation of Procedure and Variable Names as Beacons During Program Comprehension , 1991 .

[28]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[29]  Derek M. Jones The New C Standard An Economic and Cultural Commentary , 2004 .

[30]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[31]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[32]  Sven Apel,et al.  Measuring neural efficiency of program comprehension , 2017, ESEC/SIGSOFT FSE.

[33]  Juergen Rilling,et al.  Identifying comprehension bottlenecks using program slicing and cognitive complexity metrics , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[34]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[35]  Dror G. Feitelson,et al.  Effects of Variable Names on Comprehension: An Empirical Study , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[36]  Olusola Adesope,et al.  Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization , 2019, Empirical Software Engineering.

[37]  Dror G. Feitelson,et al.  Meaningful Identifier Names: The Case of Single-Letter Variables , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[38]  Elias Pimenidis,et al.  Clean Code: A Handbook of Agile Software Craftmanship , 2009 .

[39]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.