Fault-Prone Java Method Analysis Focusing on Pair of Local Variables with Confusing Names

Giving a name to a local variable is usually a programmer's discretion. Since it depends on the programmer's preference and experience, there is a lot of individual variation which may cause a variability in the code quality such as the readability. While there have been studies on the naming of local variables in the past, a relationship of names among local variables within a method (function) has not been well-discussed. This paper focuses on a pair of local variables with similar, confusing names, e.g., "lineIndex" vs. "lineIndent." Since such local variables are confusable with each other, the presence of such a confusing pair may be related to the fault-proneness of the method. An empirical analysis for five major open source Java projects is conducted, and the following results are reported: (1) a method having a confusing variable pair is about 1.1 - 2.6 times more fault-prone than a method having only dissimilar (non-confusing) pairs; (2) the proposed metric of how confusing the local variables are is equivalent to or better than the conventional cyclomatic complexity in predicting fault-prone methods.

[1]  Yijun Yu,et al.  Relating Identifier Naming Flaws and Code Quality: An Empirical Study , 2009, 2009 16th Working Conference on Reverse Engineering.

[2]  Giuseppe Scanniello,et al.  Fixing Faults in C and Java Source Code , 2017, ACM Trans. Softw. Eng. Methodol..

[3]  Brian W. Kernighan,et al.  The Practice of Programming , 1999 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Dawn J. Lawrie,et al.  The impact of identifier style on effort and comprehension , 2012, Empirical Software Engineering.

[6]  David W. Binkley,et al.  What’s in a Name? A Study of Identifiers , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[7]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[8]  Osamu Mizuno,et al.  Predicting Fault-Prone Modules Using the Length of Identifiers , 2012, 2012 Fourth International Workshop on Empirical Software Engineering in Practice.

[9]  Sousuke Amasaki,et al.  Empirical Analysis of Change-Proneness in Methods Having Local Variables with Long Names and Comments , 2015, 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[10]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[13]  David W. Binkley,et al.  Identifier length and limited programmer memory , 2009, Sci. Comput. Program..

[14]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[15]  David W. Binkley,et al.  Quantifying identifier quality: an analysis of trends , 2006, Empirical Software Engineering.