A Survey of the Forms of Java Reference Names

The readability of identifiers is a major factor of program comprehension and an aim of naming convention guidelines. Due to their semantic content, identifiers are also used in feature and bug location, among other software maintenance tasks. Looking at how names are used in practice may lead to insights on potential problems for comprehension and for programming support tools that process identifiers. Class and method names are already well represented in the literature. This paper presents an investigation of Java field, formal argument and local variable names, which we collectively call reference names. These names cannot be ignored because they constitute over half the unique names and almost 70% of the name declarations in the corpus investigated. We analysed the forms of 3.5 million reference name declarations in 60 well known Java projects, examining the phrasal structure of names composed of known words and acronyms. The structures found in practice were evaluated against those given in the literature. The use of unknown abbreviations and words, which may pose a problem for program comprehension, was also identified. Based on our observations of the rich diversity of reference names, we suggest issues to be taken into account for future academic research and for improving tools that rely on names as sources of information.

[1]  Emily Hill,et al.  AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools , 2008, MSR '08.

[2]  Einar W. Høst,et al.  The Java Programmer's Phrase Book , 2009, SLE.

[3]  Guy L. Steele,et al.  The Java Language Specification, Java SE 8 Edition , 2013 .

[4]  Scott W. Ambler,et al.  The Elements of Java Style , 2000 .

[5]  Yijun Yu,et al.  Improving the Tokenisation of Identifier Names , 2011, ECOOP.

[6]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  Einar W. Høst,et al.  Debugging Method Names , 2009, ECOOP.

[9]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[10]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[11]  David W. Binkley,et al.  Improving identifier informativeness using part of speech information , 2011, MSR '11.

[12]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[13]  Charles Simonyi,et al.  The Hungarian revolution , 1991 .

[14]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[15]  Andrew Begel,et al.  Cognitive Perspectives on the Role of Naming in Computer Programs , 2006, PPIG.

[16]  Jeffrey C. Carver,et al.  Part-of-speech tagging of program identifiers for improved text-based software engineering tools , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[17]  Paolo Tonella,et al.  Natural Language Parsing of Program Element Names for Concept Extraction , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[18]  Felix Hueber The Elements Of Java Style , 2016 .

[19]  Yijun Yu,et al.  Mining java class naming conventions , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[20]  David W. Binkley,et al.  Quantifying identifier quality: an analysis of trends , 2006, Empirical Software Engineering.

[21]  Lori Pollock,et al.  Integrating natural language and program structure information to improve software search and exploration , 2010 .

[22]  Yijun Yu,et al.  INVocD: Identifier name vocabulary dataset , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[23]  Maarten Janssen NeoTag: a POS Tagger for Grammatical Neologism Detection , 2012, LREC.

[24]  Paolo Tonella,et al.  Towards the Extraction of Domain Concepts from the Identifiers , 2011, 2011 18th Working Conference on Reverse Engineering.

[25]  Paolo Tonella,et al.  Restructuring program identifier names , 2000, Proceedings 2000 International Conference on Software Maintenance.

[26]  David W. Binkley,et al.  Normalizing Source Code Vocabulary , 2010, 2010 17th Working Conference on Reverse Engineering.

[27]  Scott W. Ambler,et al.  The Elements of Java™ Style: Index , 2000 .

[28]  Paolo Tonella,et al.  Nomen est omen: analyzing the language of function identifiers , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).