How are identifiers named in open source software? About popularity and consistency

With the rapid increasing of software project size and maintenance cost, adherence to coding standards especially by managing identifier naming, is attracting a pressing concern from both computer science educators and software managers. Software developers mainly use identifier names to represent the knowledge recorded in source code. However, the popularity and adoption consistency of identifier naming conventions have not been revealed yet in this field. Taking forty-eight popular open source projects written in three top-ranking programming languages Java, C and C++ as examples, an identifier extraction tool based on regular expression matching is developed. In the subsequent investigation, some interesting findings are obtained. For the identifier naming popularity, it is found that Camel and Pascal naming conventions are leading the road while Hungarian notation is vanishing. For the identifier naming consistency, we have found that the projects written in Java have a much better performance than those written in C and C++. Finally, academia and software industry are urged to adopt the most popular naming conventions consistently in their practices so as to lead the identifier naming to a standard, unified and high-quality road.

[1]  Yan-Quing Wang,et al.  Research and practice on education of SQA at source code level , 2011 .

[2]  Paolo Tonella,et al.  Restructuring program identifier names , 2000, Proceedings 2000 International Conference on Software Maintenance.

[3]  Kim Mens,et al.  Guest editors' introduction to the 4th issue of Experimental Software and Toolkits (EST-4) , 2014, Sci. Comput. Program..

[4]  Yijun Yu,et al.  Relating Identifier Naming Flaws and Code Quality: An Empirical Study , 2009, 2009 16th Working Conference on Reverse Engineering.

[5]  Charles Simonyi,et al.  The Hungarian revolution , 1991 .

[6]  Simon Butler The effect of identifier naming on source code readability and quality , 2009, ESEC/FSE Doctoral Symposium '09.

[7]  Breck Carter,et al.  On choosing identifiers , 1982, SIGP.

[8]  David W. Binkley,et al.  Identifier length and limited programmer memory , 2009, Sci. Comput. Program..

[9]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[10]  Sushil Krishna Bajracharya,et al.  Sourcerer: An infrastructure for large-scale collection and analysis of open-source code , 2014, Sci. Comput. Program..

[11]  Daniel Keller A guide to natural naming , 1990, SIGP.

[12]  David W. Binkley,et al.  Quantifying identifier quality: an analysis of trends , 2006, Empirical Software Engineering.

[13]  Barry W. Boehm,et al.  Software Defect Reduction Top 10 List , 2001, Computer.

[14]  Mats Henricson,et al.  Programming in C++. Rules and recommendations , 1992 .

[15]  Simon Butler,et al.  Mining Java class identifier naming conventions , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[16]  Barry Boehm,et al.  Top 10 list [software development] , 2001 .

[17]  Phillip A. Relf,et al.  Tool assisted identifier naming for improved software readability: an empirical study , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[18]  Keith H. Bennett,et al.  Software maintenance and evolution: a roadmap , 2000, ICSE '00.

[19]  Dawn J. Lawrie,et al.  The impact of identifier style on effort and comprehension , 2012, Empirical Software Engineering.

[20]  David W. Binkley,et al.  What’s in a Name? A Study of Identifiers , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[21]  Nicolas Anquetil,et al.  Assessing the relevance of identifier names in a legacy software system , 1998, CASCON.