Application of LSSVM and SMOTE on Seven Open Source Projects for Predicting Refactoring at Class Level

Source code refactoring consisting of modifying the structure of the source code without changing its functionality and external behavior. We present a method to predict refactoring candidates at class level which can help developers in improving their design and structure of source code while preserving the behavior. We propose a technique to predict refactoring candidates based on the application of a machine learning based framework. We use Least Squares Support Vector Machines (LS-SVM) as the learning algorithm, Principal Component Analysis (PCA) as a feature extraction technique and Synthetic Minority Over-sampling Technique (SMOTE) as a technique for handling imbalanced data. We start with 102 source code metrics as input features which are then reduced to 31 features after removing irrelevant and redundant features through statistical tests. We conduct a series of experiments on publicly available software engineering dataset consisting of seven open-source software systems in which the refactored classes are manually validated. We apply LS-SVM with three different functions: linear, polynomial and Radial Basis Function (RBF). Statistical significance test demonstrate that RBF kernel outperforms linear and polynomial kernel but there is no statistically significant difference between the performance of linear and polynomial kernel. Statistical significance test reveals that with-SMOTE technique outperforms without-SMOTE and all metrics outperforms PCA based metrics. The mean value of Area Under Curve (AUC) for LS-SVM RBF kernel is 0.96.

[1]  Johan A. K. Suykens,et al.  Least squares support vector machine classifiers: a large scale algorithm , 1999 .

[2]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[3]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[4]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[5]  Tibor Gyimóthy,et al.  A Code Refactoring Dataset and Its Assessment Regarding Software Maintainability , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[6]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[7]  Mohammad Alshayeb,et al.  Software refactoring at the class level using clustering techniques , 2011 .

[8]  Jehad Al Dallal Constructing models for predicting extract subclass refactoring opportunities using object-oriented quality metrics , 2012, Inf. Softw. Technol..

[9]  Tom Mens,et al.  A survey of software refactoring , 2004, IEEE Transactions on Software Engineering.

[10]  Miryung Kim,et al.  Ref-Finder: a refactoring reconstruction tool based on logic query templates , 2010, FSE '10.

[11]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12]  Robert Feldt,et al.  Validity Threats in Empirical Software Engineering Research - An Initial Survey , 2010, SEKE.

[13]  Zhendong Niu,et al.  Identification of generalization refactoring opportunities , 2013, Automated Software Engineering.

[14]  Tom Mens,et al.  Identifying refactoring opportunities using logic meta programming , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[15]  M. Alvesson,et al.  Ways of constructing research questions: gap-spotting or problematization? , 2011 .

[16]  Johan A. K. Suykens,et al.  Sparse approximation using least squares support vector machines , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[17]  Tibor Gyimóthy,et al.  A Manually Validated Code Refactoring Dataset and Its Assessment Regarding Software Maintainability , 2016, PROMISE.

[18]  Alexander Chatzigeorgiou,et al.  Identification of refactoring opportunities introducing polymorphism , 2010, J. Syst. Softw..

[19]  Liming Zhao,et al.  Predicting Classes in Need of Refactoring : An Application of Static Metrics , 2006 .

[20]  Eleni Stroulia,et al.  JDeodorant: identification and application of extract class refactorings , 2011, 2011 33rd International Conference on Software Engineering (ICSE).