Comparison of Multi-Label Classification Algorithms for Code Smell Detection

Code smells in a source code shows the weakness of design or implementation. To detect code smells, several detection tools have been developed. However, these tools generally produce different results, since code smells are subjectively interpreted, informally defined and configured by the developers, domain-dependent and based on opinions and experiences. To cope with these issues, in this paper, we have used machine learning techniques, especially multi-label classification methods, to classify whether the given source code is affected with more than one code smells or not. We have conducted experiments on four code smell datasets and transformed them into two multi-label datasets (one for method level and the other one for class level). Two multi-label classification methods (Classifier Chains and Label Combination) and their ensemble models performed on the converted datasets using five different base classifiers. The results show that, as a base classifier, Random Forest algorithm performs better than Decision Tree, Naive Bayes, Support Vector Machine and Neural Network algorithms.

[1]  Lin Shi,et al.  Machine learning techniques for code smell detection: A systematic literature review and meta-analysis , 2019, Inf. Softw. Technol..

[2]  Salman Abdul Moiz,et al.  Code smell detection using multi-label classification approach , 2019, Software Quality Journal.

[3]  Mika Mäntylä,et al.  Comparing and experimenting machine learning techniques for code smell detection , 2015, Empirical Software Engineering.

[4]  Houari A. Sahraoui,et al.  A Cooperative Parallel Search-Based Software Engineering Approach for Code-Smells Detection , 2014, IEEE Transactions on Software Engineering.

[5]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[6]  Francesca Arcelli Fontana,et al.  Code smell severity classification using machine learning techniques , 2017, Knowl. Based Syst..

[7]  Francesca Arcelli Fontana,et al.  Automatic detection of bad smells in code: An experimental assessment , 2012, J. Object Technol..

[8]  Jochen Kreimer,et al.  Adaptive Detection of Design Flaws , 2005, LDTA@ETAPS.

[9]  Peng Gao,et al.  Reconstruction Regularized Deep Metric Learning for Multi-Label Image Classification , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Alex Alves Freitas,et al.  Multi-label classification search space in the MEKA software , 2018, ArXiv.

[11]  Eduardo Figueiredo,et al.  On the evaluation of code smells and detection tools , 2017, Journal of Software Engineering Research and Development.

[12]  Nakarin Maneerat,et al.  Bad-smell prediction from software design model using machine learning techniques , 2011, 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE).

[13]  Yang Feng,et al.  An Empirical Study on Software Failure Classification with Multi-label and Problem-Transformation Techniques , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[14]  Yang Feng,et al.  Towards more accurate multi-label software behavior learning , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).