Comparing and experimenting machine learning techniques for code smell detection

Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.

[1]  Claes Wohlin,et al.  Using students as subjects - an empirical evaluation , 2008, ESEM '08.

[2]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[3]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[4]  Mark Lorenz,et al.  Object-oriented software metrics - a practical guide , 1994 .

[5]  Foutse Khomh,et al.  BDTEX: A GQM-based Bayesian approach for the detection of antipatterns , 2011, J. Syst. Softw..

[6]  Yann-Gaël Guéhéneuc,et al.  SMURF: A SVM-based Incremental Anti-pattern Detection Approach , 2012, 2012 19th Working Conference on Reverse Engineering.

[7]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[8]  Andrew P. Black,et al.  An interactive ambient visualization for code smells , 2010, SOFTVIS '10.

[9]  Ioannis Stamelos,et al.  A controlled experiment investigation of an object-oriented design heuristic for maintainability , 2004, J. Syst. Softw..

[10]  C. Borror Nonparametric Statistical Methods, 2nd, Ed. , 2001 .

[11]  Francesca Arcelli Fontana,et al.  Investigating the Impact of Code Smells on System's Quality: An Empirical Study on Systems of Different Application Domains , 2013, 2013 IEEE International Conference on Software Maintenance.

[12]  Walter F. Tichy,et al.  Hints for Reviewing Empirical Work in Software Engineering , 2000, Empirical Software Engineering.

[13]  Raed Shatnawi,et al.  An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution , 2007, J. Syst. Softw..

[14]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[15]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[16]  K. K. Aggarwal,et al.  Empirical Study of Object-Oriented Metrics , 2006, J. Object Technol..

[17]  Sanjay Kumar Dubey,et al.  Comparison of Software Quality Metrics for Object-Oriented System , 2012 .

[18]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[20]  Mika Mäntylä,et al.  Bad smells - humans as code critics , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[21]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[22]  John Boyland,et al.  Integrating code smells detection with refactoring tool support , 2012 .

[23]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[24]  Audris Mockus,et al.  Quantifying the Effect of Code Smells on Maintenance Effort , 2013, IEEE Transactions on Software Engineering.

[25]  Naftali Tishby,et al.  Is Feature Selection Still Necessary? , 2005, SLSFS.

[26]  M. Mäntylä,et al.  Subjective evaluation of software evolvability using code smells: An empirical study , 2006, Empirical Software Engineering.

[27]  Cristina Marinescu,et al.  iPlasma: An Integrated Platform for Quality Assessment of Object-Oriented Design , 2005, ICSM.

[28]  Mika Mäntylä,et al.  Code Smell Detection: Towards a Machine Learning-Based Approach , 2013, 2013 IEEE International Conference on Software Maintenance.

[29]  Tracy Hall,et al.  The inconsistent measurement of Message Chains , 2013, 2013 4th International Workshop on Emerging Trends in Software Metrics (WETSoM).

[30]  Mei-Hwa Chen,et al.  An empirical study on object-oriented metrics , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[31]  Yann-Gaël Guéhéneuc,et al.  Fingerprinting design patterns , 2004, 11th Working Conference on Reverse Engineering.

[32]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[33]  Jochen Kreimer,et al.  Adaptive Detection of Design Flaws , 2005, LDTA@ETAPS.

[34]  Tracy Hall,et al.  Code Bad Smells: a review of current knowledge , 2011, J. Softw. Maintenance Res. Pract..

[35]  Yann-Gaël Guéhéneuc,et al.  DECOR: A Method for the Specification and Detection of Code and Design Smells , 2010, IEEE Transactions on Software Engineering.

[36]  Emile H. L. Aarts,et al.  Global optimization and simulated annealing , 1991, Math. Program..

[37]  Foutse Khomh,et al.  An Empirical Study of the Impact of Two Antipatterns, Blob and Spaghetti Code, on Program Comprehension , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[38]  Aiko Fallas Yamashita,et al.  Do code smells reflect important maintainability aspects? , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[39]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[40]  Foutse Khomh,et al.  A Bayesian Approach for the Detection of Code and Design Smells , 2009, 2009 Ninth International Conference on Quality Software.

[41]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[42]  Radu Marinescu,et al.  Measurement and Quality in Object-Oriented Design , 2005, ICSM.

[43]  Forrest Shull,et al.  Investigating the impact of design debt on software quality , 2011, MTD '11.

[44]  Yi Sun,et al.  Some Code Smells Have a Significant but Small Effect on Faults , 2014, TSEM.

[45]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[46]  Francesca Arcelli Fontana,et al.  Automatic detection of bad smells in code: An experimental assessment , 2012, J. Object Technol..

[47]  Diomidis Spinellis A tale of four kernels , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[48]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[49]  Alexander Chatzigeorgiou,et al.  Identification of Move Method Refactoring Opportunities , 2009, IEEE Transactions on Software Engineering.

[50]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[51]  Chiara Francalanci,et al.  Firms' involvement in Open Source projects: A trade-off between software structural quality and popularity , 2011, J. Syst. Softw..

[52]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[53]  Shinji Kusumoto,et al.  Filtering clones for individual user based on machine learning analysis , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[54]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[55]  Jeffrey C. Carver,et al.  Issues in using students in empirical studies in software engineering education , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[56]  Witold Pedrycz,et al.  A Case Study on the Impact of Refactoring on Quality and Productivity in an Agile Team , 2008, CEE-SET.

[57]  Ioannis Stamelos,et al.  Code quality analysis in open source software development , 2002, Inf. Syst. J..

[58]  George Hripcsak,et al.  The effect of sample size and disease prevalence on supervised machine learning of narrative data , 2002, AMIA.

[59]  Aiko Yamashita,et al.  Assessing the capability of code smells to explain maintenance problems: an empirical study combining quantitative and qualitative data , 2013, Empirical Software Engineering.

[60]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[61]  F Arcelli Fontana,et al.  Is it a Real Code Smell to be Removed or not , 2013 .

[62]  Ruben Wieman,et al.  Anti-Pattern Scanner: An Approach to Detect Anti-Patterns and Design Violations , 2011 .

[63]  Claes Wohlin,et al.  Using Students as Subjects—A Comparative Study of Students and Professionals in Lead-Time Impact Assessment , 2000, Empirical Software Engineering.

[64]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[65]  Deepak Goyal,et al.  A hierarchical model for object-oriented design quality assessment , 2015 .

[66]  David Lo,et al.  Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction , 2012, 2012 19th Working Conference on Reverse Engineering.

[67]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[68]  Yann-Gaël Guéhéneuc,et al.  Support vector machines for anti-pattern detection , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[69]  Patrik Berander,et al.  Using students as subjects in requirements prioritization , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[70]  Gabriele Bavota,et al.  Detecting bad smells in source code using change history information , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[71]  Daniela Cruzes,et al.  Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems , 2010, 2010 IEEE International Conference on Software Maintenance.

[72]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..