Threshold Extraction Framework for Software Metrics

Software metrics are used to measure different attributes of software. To practically measure software attributes using these metrics, metric thresholds are needed. Many researchers attempted to identify these thresholds based on personal experiences. However, the resulted experience-based thresholds cannot be generalized due to the variability in personal experiences and the subjectivity of opinions. The goal of this paper is to propose an automated clustering framework based on the expectation maximization (EM) algorithm where clusters are generated using a simplified 3-metric set (LOC, LCOM, and CBO). Given these clusters, different threshold levels for software metrics are systematically determined such that each threshold reflects a specific level of software quality. The proposed framework comprises two major steps: the clustering step where the software quality historical dataset is decomposed into a fixed set of clusters using the EM algorithm, and the threshold extraction step where thresholds, specific to each software metric in the resulting clusters, are estimated using statistical measures such as the mean (μ) and the standard deviation (σ) of each software metric in each cluster. The paper’s findings highlight the capability of EM-based clustering, using a minimum metric set, to group software quality datasets according to different quality levels.

[1]  Oral Alan,et al.  Class noise detection based on software metrics and ROC curves , 2011, Inf. Sci..

[2]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[3]  C.J.H. Mann,et al.  Object-Oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems , 2007 .

[4]  Michele Marchesi,et al.  Power-Laws in a Large Object-Oriented Software System , 2007, IEEE Transactions on Software Engineering.

[5]  Tiago L. Alves,et al.  Deriving metric thresholds from benchmark data , 2010, 2010 IEEE International Conference on Software Maintenance.

[6]  Chuong B Do,et al.  What is the expectation maximization algorithm? , 2008, Nature Biotechnology.

[7]  Steve Counsell,et al.  Power law distributions in class relationships , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[8]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[9]  Brian A. Nejmeh,et al.  NPATH: a measure of execution path complexity and its applications , 1988, CACM.

[10]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[11]  Roberto da Silva Bigonha,et al.  Identifying thresholds for object-oriented software metrics , 2012, J. Syst. Softw..

[12]  ConcasGiulio,et al.  Power-Laws in a Large Object-Oriented Software System , 2007 .

[13]  Hongyu Zhang,et al.  An investigation of the relationships between lines of code and defects , 2009, 2009 IEEE International Conference on Software Maintenance.

[14]  Marian Jureczko,et al.  Using Object-Oriented Design Metrics to Predict Software Defects 1* , 2010 .

[15]  Eduardo Figueiredo,et al.  TDTool: threshold derivation tool , 2016, EASE.

[16]  Claus Lewerentz,et al.  Applying design-metrics to object-oriented frameworks , 1996, Proceedings of the 3rd International Software Metrics Symposium.

[17]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[18]  Paul W. Oman,et al.  The application of software maintainability models in industrial software systems , 1995, J. Syst. Softw..

[19]  Xiao Liu,et al.  An empirical study on software defect prediction with a simplified metric set , 2014, Inf. Softw. Technol..

[20]  Khaled El Emam,et al.  The Optimal Class Size for Object-Oriented Software , 2002, IEEE Trans. Software Eng..

[21]  Raed Shatnawi,et al.  Finding software metrics threshold values using ROC curves , 2010, J. Softw. Maintenance Res. Pract..

[22]  Khaled El Emam,et al.  Thresholds for object-oriented measures , 2000, Proceedings 11th International Symposium on Software Reliability Engineering. ISSRE 2000.

[23]  Narendra Sharma,et al.  Comparison the various clustering algorithms of weka tools , 2012 .

[24]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[25]  M. Lipow,et al.  Number of Faults per Line of Code , 1982, IEEE Transactions on Software Engineering.

[26]  Eduardo Figueiredo,et al.  A Method to Derive Metric Thresholds for Software Product Lines , 2015, 2015 29th Brazilian Symposium on Software Engineering.

[27]  Pawel Lewicki,et al.  Statistics : methods and applications : a comprehensive reference for science, industry, and data mining , 2006 .

[28]  Jens Grabowski,et al.  Calculation and optimization of thresholds for sets of software metrics , 2011, Empirical Software Engineering.

[29]  Eduardo Figueiredo,et al.  Detecting Code Smells in Software Product Lines -- An Exploratory Study , 2015, 2015 12th International Conference on Information Technology - New Generations.

[30]  Rüdiger Lincke,et al.  Comparing software metrics tools , 2008, ISSTA '08.

[31]  Marco Tulio Valente,et al.  Extracting relative thresholds for source code metrics , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[32]  Brian Henderson-Sellers,et al.  Object-oriented metrics: measures of complexity , 1995 .