Shorter identifier names take longer to comprehend

Developers spend the majority of their time reading code, a process in which identifier names play a key role. Although many identifier naming styles exist, they often lack an empirical basis and it is not clear whether short or long identifier names facilitate comprehension. In this paper, we investigate the effect of different identifier naming styles (single letters, abbreviations, and words) on program comprehension. We conducted an experimental study with 72 professional C# developers who had to locate defects in source code snippets. We used a within-subjects design, such that each developer worked with all three versions of identifier naming styles, and we measured the time it took them to find a defect. We found that word identifiers led to a 19% increase in speed to find defects compared to meaningless single letters and abbreviations, but we did not find a difference between letters and abbreviations. The results of our study suggest that code is more difficult to comprehend when it contains only letters and abbreviations as identifier names. Words as identifier names facilitate program comprehension and may help to save costs and improve software quality.

[1]  David W. Binkley,et al.  To CamelCase or under_score , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[2]  Oliver Schmitt BUCHBESPRECHUNGR. Leonhart, Lehrbuch Statistik. Einstieg und Vertiefung, Verlag Hans Huber, Bern Göttingen Toronto Seattle, ISBN3-456-84034-9, 2004 (496 Seiten, 93 Abb., 148 Tab., 29,95 Euro/52,50 sFr). , 2005 .

[3]  N. Cowan The magical number 4 in short-term memory: A reconsideration of mental storage capacity , 2001, Behavioral and Brain Sciences.

[4]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[5]  Premkumar T. Devanbu,et al.  A simpler model of software readability , 2011, MSR '11.

[6]  David W. Binkley,et al.  What’s in a Name? A Study of Identifiers , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[7]  Nicolas Anquetil,et al.  Assessing the relevance of identifier names in a legacy software system , 1998, CASCON.

[8]  Marco Torchiano,et al.  A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques , 2013, Empirical Software Engineering.

[9]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[10]  Janet Siegmund,et al.  Shorter identifier names take longer to comprehend , 2017, SANER.

[11]  D. Balota,et al.  The locus of word-frequency effects in the pronunciation task: Lexical access and/or production? ☆ , 1985 .

[12]  B. Weekes Differential Effects of Number of Letters on Word and Nonword Naming Latency , 1997 .

[13]  Mark Carpenter,et al.  The New Statistical Analysis of Data , 2000, Technometrics.

[14]  Mario Linares Vásquez,et al.  Improving code readability models with textual features , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[15]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[16]  James R. Booth,et al.  Fillers and spaces in text: The importance of word recognition during reading , 1997, Vision Research.

[17]  R. Whelan Effective Analysis of Reaction Time Data , 2008 .

[18]  Jonathan I. Maletic,et al.  An Eye Tracking Study on camelCase and under_score Identifier Styles , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[19]  J. P. Morgan,et al.  Design and Analysis: A Researcher's Handbook , 2005, Technometrics.

[20]  M Coltheart,et al.  DRC: a dual route cascaded model of visual word recognition and reading aloud. , 2001, Psychological review.

[21]  H. Remschmidt Lehrbuch Statistik. Einstieg und Vertiefung. , 2005 .

[22]  Harry M. Sneed Object-oriented COBOL recycling , 1996, Proceedings of WCRE '96: 4rd Working Conference on Reverse Engineering.

[23]  G. M. Reicher Perceptual recognition as a function of meaninfulness of stimulus material. , 1969, Journal of experimental psychology.

[24]  Westley Weimer,et al.  Learning a Metric for Code Readability , 2010, IEEE Transactions on Software Engineering.

[25]  Roger Ratcliff,et al.  Methods for Dealing With Reaction Time Outliers , 1992 .

[26]  David W. Binkley,et al.  Effective identifier names for comprehension and memory , 2007, Innovations in Systems and Software Engineering.

[27]  A. Baddeley,et al.  Word length and the structure of short-term memory , 1975 .

[28]  Walter F. Tichy,et al.  Should Computer Scientists Experiment More? , 1998, Computer.

[29]  Kate Ehrlich,et al.  Empirical Studies of Programming Knowledge , 1984, IEEE Transactions on Software Engineering.

[30]  Ruven E. Brooks,et al.  Towards a Theory of the Comprehension of Computer Programs , 1983, Int. J. Man Mach. Stud..

[31]  R. Bakeman Recommended effect size statistics for repeated measures designs , 2005, Behavior research methods.

[32]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[33]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[34]  Kim Marriott,et al.  A tool for tracking visual attention: The Restricted Focus Viewer , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.