Measuring the effect of clone refactoring on the size of unit test cases in object-oriented software: an empirical study

This paper aims at empirically measuring the effect of clone refactoring on the size of unit test cases in object-oriented software. We investigated various research questions related to the: (1) impact of clone refactoring on source code attributes (particularly size, complexity and coupling) that are related to testability of classes, (2) impact of clone refactoring on the size of unit test cases, (3) correlations between the variations observed after clone refactoring in both source code attributes and the size of unit test cases and (4) variations after clone refactoring in the source code attributes that are more associated with the size of unit test cases. We used different metrics to quantify the considered source code attributes and the size of unit test cases. To investigate the research questions, and develop predictive and explanatory models, we used various data analysis and modeling techniques, particularly linear regression analysis and five machine learning algorithms (C4.5, KNN, Naïve Bayes, Random Forest and Support Vector Machine). We conducted an empirical study using data collected from two open-source Java software systems (ANT and ARCHIVA) that have been clone refactored. Overall, the paper contributions can be summarized as: (1) the results revealed that there is a strong and positive correlation between code clone refactoring and reduction in the size of unit test cases, (2) we showed how code quality attributes that are related to testability of classes are significantly improved when clones are refactored, (3) we observed that the size of unit test cases can be significantly reduced when clone refactoring is applied, and (4) complexity/size measures are commonly associated with the variations of the size of unit test cases when compared to coupling.

[1]  The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction , 2017, Innovations in Systems and Software Engineering.

[2]  Francesca Arcelli Fontana,et al.  Software Clone Detection and Refactoring , 2013 .

[3]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[4]  Mourad Badri,et al.  Empirical Analysis for Investigating the Effect of Control Flow Dependencies on Testability of Classes , 2011, SEKE.

[5]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[6]  Shah Nazir,et al.  A clone management framework to improve code quality of FOSS projects , 2017, 2017 International Conference on Communication, Computing and Digital Systems (C-CODE).

[7]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[8]  Tibor Gyimóthy,et al.  Empirical study on refactoring large-scale industrial systems and its effects on maintainability , 2017, J. Syst. Softw..

[9]  Raed Shatnawi,et al.  Deriving metrics thresholds using log transformation , 2015, J. Softw. Evol. Process..

[10]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[11]  Takeo Imai,et al.  A quantitative evaluation of maintainability enhancement by refactoring , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[12]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[13]  U. Devi,et al.  A review on quality models to analyse the impact of refactored code on maintainability with reference to software product line , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[14]  Bassey Isong,et al.  A Systematic Review of the Empirical Validation of Object-Oriented Metrics towards Fault-proneness Prediction , 2013, Int. J. Softw. Eng. Knowl. Eng..

[15]  Mourad Badri,et al.  Investigating the Effect of Aspect-Oriented Refactoring on the Unit Testing Effort of Classes: An Empirical Evaluation , 2017, Int. J. Softw. Eng. Knowl. Eng..

[16]  Cristina V. Lopes,et al.  Comparing Quality Metrics for Cloned and Non Cloned Java Methods: A Large Scale Empirical Study , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[17]  Luc Lamontagne,et al.  Predicting different levels of the unit testing effort of classes using source code metrics: a multiple case study on open-source software , 2017, Innovations in Systems and Software Engineering.

[18]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[19]  Raed Shatnawi,et al.  A Quantitative Investigation of the Acceptable Risk Levels of Object-Oriented Metrics in Open-Source Systems , 2010, IEEE Transactions on Software Engineering.

[20]  Arie van Deursen,et al.  An empirical study into class testability , 2006, J. Syst. Softw..

[21]  Victor R. Basili,et al.  A validation of object oriented metrics as quality indicators , 1996 .

[22]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[23]  Tibor Gyimóthy,et al.  A Code Refactoring Dataset and Its Assessment Regarding Software Maintainability , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[24]  Yuming Zhou,et al.  On the ability of complexity metrics to predict fault-prone classes in object-oriented systems , 2010, J. Syst. Softw..

[25]  Mourad Badri,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting Unit Testing Effort of Classes , 2012 .

[26]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[27]  Chanchal Kumar Roy,et al.  An Empirical Study of Function Clones in Open Source Software , 2008, 2008 15th Working Conference on Reverse Engineering.

[28]  Ruchika Malhotra,et al.  Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality , 2012, J. Inf. Process. Syst..

[29]  Andrew Begel,et al.  Managing Duplicated Code with Linked Editing , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[30]  Raed Shatnawi,et al.  Finding software metrics threshold values using ROC curves , 2010, J. Softw. Maintenance Res. Pract..

[31]  Alexandre Boucher,et al.  Software metrics thresholds calculation techniques to predict fault-proneness: An empirical comparison , 2017, Inf. Softw. Technol..

[32]  Yuming Zhou,et al.  An in-depth investigation into the relationships between structural metrics and unit testability in object-oriented systems , 2012, Science China Information Sciences.

[33]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[34]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[35]  Bart Baesens,et al.  Comprehensible software fault and effort prediction: A data mining approach , 2015, J. Syst. Softw..

[36]  Robert V. Binder,et al.  Design for testability in object-oriented systems , 1994, CACM.

[37]  Chanchal Kumar Roy,et al.  On the relationships between domain-based coupling and code clones: An exploratory study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[38]  Y. Singh,et al.  Predicting Testability of Eclipse: A Case Study , 2010 .

[39]  Arvinder Kaur,et al.  Predicting Testing Effort Using Artificial Neural Network , 2008 .

[40]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[41]  Ruchika Malhotra,et al.  Fault prediction considering threshold effects of object‐oriented metrics , 2015, Expert Syst. J. Knowl. Eng..

[42]  Rainer Koschke,et al.  A survey on goal-oriented visualization of clone data , 2015, 2015 IEEE 3rd Working Conference on Software Visualization (VISSOFT).

[43]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[44]  Fatma Dandashi,et al.  A method for assessing the reusability of object-oriented code using a validated set of automated measurements , 2002, SAC '02.

[45]  Chanchal Kumar Roy,et al.  On the Relationships Between Stability and Bug-Proneness of Code Clones: An Empirical Study , 2017, 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[46]  Luc Lamontagne,et al.  Towards a Unified Metrics Suite for JUnit Test Cases , 2014, SEKE.

[47]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[48]  Premkumar T. Devanbu,et al.  Clones: What is that smell? , 2010, MSR.

[49]  István Siket,et al.  Effect of object oriented refactorings on testability, error proneness and other maintainability attributes , 2010 .

[50]  Sandeep Srivastava,et al.  Indirect method to measure software quality using CK-OO suite , 2013, 2013 International Conference on Intelligent Systems and Signal Processing (ISSP).

[51]  Tom Mens,et al.  A survey of software refactoring , 2004, IEEE Transactions on Software Engineering.

[52]  Sanjay Bharadwaj,et al.  Impact of Clone Refactoring on External Quality Attributes of Open Source Softwares , 2018 .

[53]  Luc Lamontagne,et al.  A metrics suite for JUnit test code: a multiple case study on open source software , 2014, Journal of Software Engineering Research and Development.

[54]  Arvinder Kaur,et al.  Performance analysis of ensemble learning for predicting defects in open source software , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[55]  Arie van Deursen,et al.  Predicting class testability using object-oriented metrics , 2004, Source Code Analysis and Manipulation, Fourth IEEE International Workshop on.

[56]  R. Shatnawi Improving software fault-prediction for imbalanced data , 2012, 2012 International Conference on Innovations in Information Technology (IIT).

[57]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.

[58]  David P. Darcy,et al.  Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis , 1998, IEEE Trans. Software Eng..

[59]  Mohammad Alshayeb,et al.  Empirical investigation of refactoring effect on software quality , 2009, Inf. Softw. Technol..

[60]  Cristina V. Lopes,et al.  A Comparative Study of Bug Patterns in Java Cloned and Non-cloned Code , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[61]  Michele Marchesi,et al.  Refactoring and its Relationship with Fan-in and Fan-out: An Empirical Study , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[62]  Manishankar Mondal,et al.  An Empirical Study of the Impacts of Clones in Software Maintenance , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[63]  Mourad Badri,et al.  On the effect of aspect-oriented refactoring on testability of classes: A case study , 2012, 2012 International Conference on Computer Systems and Industrial Informatics.