论文信息 - Fault-Prone Java Method Analysis Focusing on Pair of Local Variables with Confusing Names

Fault-Prone Java Method Analysis Focusing on Pair of Local Variables with Confusing Names

Giving a name to a local variable is usually a programmer's discretion. Since it depends on the programmer's preference and experience, there is a lot of individual variation which may cause a variability in the code quality such as the readability. While there have been studies on the naming of local variables in the past, a relationship of names among local variables within a method (function) has not been well-discussed. This paper focuses on a pair of local variables with similar, confusing names, e.g., "lineIndex" vs. "lineIndent." Since such local variables are confusable with each other, the presence of such a confusing pair may be related to the fault-proneness of the method. An empirical analysis for five major open source Java projects is conducted, and the following results are reported: (1) a method having a confusing variable pair is about 1.1 - 2.6 times more fault-prone than a method having only dissimilar (non-confusing) pairs; (2) the proposed metric of how confusing the local variables are is equivalent to or better than the conventional cyclomatic complexity in predicting fault-prone methods.

[1] Yijun Yu,et al. Relating Identifier Naming Flaws and Code Quality: An Empirical Study , 2009, 2009 16th Working Conference on Reverse Engineering.

[2] Giuseppe Scanniello,et al. Fixing Faults in C and Java Source Code , 2017, ACM Trans. Softw. Eng. Methodol..

[3] Brian W. Kernighan,et al. The Practice of Programming , 1999 .

[4] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[5] Dawn J. Lawrie,et al. The impact of identifier style on effort and comprehension , 2012, Empirical Software Engineering.

[6] David W. Binkley,et al. What’s in a Name? A Study of Identifiers , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[7] Anas N. Al-Rabadi,et al. A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[8] Osamu Mizuno,et al. Predicting Fault-Prone Modules Using the Length of Identifiers , 2012, 2012 Fourth International Workshop on Empirical Software Engineering in Practice.

[9] Sousuke Amasaki,et al. Empirical Analysis of Change-Proneness in Methods Having Local Variables with Long Names and Comments , 2015, 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[10] Bart Baesens,et al. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[11] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12] Norman E. Fenton,et al. Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[13] David W. Binkley,et al. Identifier length and limited programmer memory , 2009, Sci. Comput. Program..

[14] Markus Pizka,et al. Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[15] David W. Binkley,et al. Quantifying identifier quality: an analysis of trends , 2006, Empirical Software Engineering.