Detecting code smells using machine learning techniques: Are we there yet?

Code smells are symptoms of poor design and implementation choices weighing heavily on the quality of produced source code. During the last decades several code smell detection tools have been proposed. However, the literature shows that the results of these tools can be subjective and are intrinsically tied to the nature and approach of the detection. In a recent work the use of Machine-Learning (ML) techniques for code smell detection has been proposed, possibly solving the issue of tool subjectivity giving to a learner the ability to discern between smelly and non-smelly source code elements. While this work opened a new perspective for code smell detection, it only considered the case where instances affected by a single type smell are contained in each dataset used to train and test the machine learners. In this work we replicate the study with a different dataset configuration containing instances of more than one type of smell. The results reveal that with this configuration the machine learning techniques reveal critical limitations in the state of the art which deserve further research.

[1]  M Wood,et al.  Replication of Experimental Results in Software Engineering , 2022 .

[2]  Foutse Khomh,et al.  Tracking Design Smells: Lessons from a Study of God Classes , 2009, 2009 16th Working Conference on Reverse Engineering.

[3]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[4]  Francesca Arcelli Fontana,et al.  Code smell severity classification using machine learning techniques , 2017, Knowl. Based Syst..

[5]  K. Goulden,et al.  Effect Sizes for Research: A Broad Practical Approach , 2006 .

[6]  Andrea De Lucia,et al.  Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[7]  Radu Marinescu,et al.  Detection strategies: metrics-based rules for detecting design flaws , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[8]  Walter Bartosz,et al.  Antipattern and Code Smell False Positives: Preliminary Conceptualization and Classification , 2016 .

[9]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[10]  Gabriele Bavota,et al.  An empirical investigation into the nature of test smells , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[11]  Mika Mäntylä,et al.  Comparing and experimenting machine learning techniques for code smell detection , 2015, Empirical Software Engineering.

[12]  Houari A. Sahraoui,et al.  A Cooperative Parallel Search-Based Software Engineering Approach for Code-Smells Detection , 2014, IEEE Transactions on Software Engineering.

[13]  Klaus Schmid,et al.  Perspectives on the Future of Software Engineering , 2013, Springer Berlin Heidelberg.

[14]  Francesca Arcelli Fontana,et al.  Toward a Smell-Aware Bug Prediction Model , 2019, IEEE Transactions on Software Engineering.

[15]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[16]  Foutse Khomh,et al.  A Bayesian Approach for the Detection of Code and Design Smells , 2009, 2009 Ninth International Conference on Quality Software.

[17]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[18]  Alexander Chatzigeorgiou,et al.  Investigating the Evolution of Bad Smells in Object-Oriented Code , 2010, 2010 Seventh International Conference on the Quality of Information and Communications Technology.

[19]  Tracy Hall,et al.  Code Bad Smells: a review of current knowledge , 2011, J. Softw. Maintenance Res. Pract..

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Foutse Khomh,et al.  An exploratory study of the impact of antipatterns on class change- and fault-proneness , 2011, Empirical Software Engineering.

[22]  Andrea De Lucia,et al.  An Exploratory Study on the Relationship between Changes and Refactoring , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[23]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[24]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[25]  V MäntyläMika,et al.  Comparing and experimenting machine learning techniques for code smell detection , 2016 .

[26]  A. Yamashita,et al.  Exploring the impact of inter-smell relations on software maintainability: An empirical study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[27]  Fabio Q. B. da Silva,et al.  Replication of empirical studies in software engineering research: a systematic mapping study , 2012, Empirical Software Engineering.

[28]  Aiko Fallas Yamashita,et al.  Do code smells reflect important maintainability aspects? , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[29]  Audris Mockus,et al.  Quantifying the Effect of Code Smells on Maintenance Effort , 2013, IEEE Transactions on Software Engineering.

[30]  Davide Taibi,et al.  How developers perceive smells in source code: A replicated study , 2017, Inf. Softw. Technol..

[31]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[32]  Mika Mäntylä,et al.  Code Smell Detection: Towards a Machine Learning-Based Approach , 2013, 2013 IEEE International Conference on Software Maintenance.

[33]  Lu Zhang,et al.  Can I clone this piece of code here? , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[34]  Marouane Kessentini,et al.  On the use of design defect examples to detect model refactoring opportunities , 2015, Software Quality Journal.

[35]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[36]  C. Borror Practical Nonparametric Statistics, 3rd Ed. , 2001 .

[37]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[39]  Andrea De Lucia,et al.  On the diffusion of test smells in automatically generated test code: an empirical study , 2016, SBST@ICSE.

[40]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[41]  Daniela Cruzes,et al.  The evolution and impact of code smells: A case study of two open source systems , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[42]  Tracy Hall,et al.  Developing Fault-Prediction Models: What the Research Can Show Industry , 2011, IEEE Software.

[43]  Robert L. Nord,et al.  Technical Debt: From Metaphor to Theory and Practice , 2012, IEEE Software.

[44]  Eduardo Figueiredo,et al.  A review-based comparative study of bad smell detection tools , 2016, EASE.

[45]  Robert L. Nord,et al.  Managing technical debt in software-reliant systems , 2010, FoSER '10.

[46]  Alexander Serebrenik,et al.  How do Scratch Programmers Name Variables and Procedures? , 2017, 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[47]  Foutse Khomh,et al.  IDS: An Immune-Inspired Approach for the Detection of Software Design Smells , 2010, 2010 Seventh International Conference on the Quality of Information and Communications Technology.

[48]  Andrea De Lucia,et al.  A textual-based technique for Smell Detection , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[49]  Gabriele Bavota,et al.  Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[50]  Tracy Hall,et al.  Software defect prediction: do different classifiers find the same defects? , 2017, Software Quality Journal.

[51]  M.M. Lehman,et al.  Programs, life cycles, and laws of software evolution , 1980, Proceedings of the IEEE.

[52]  Andy Zaidman,et al.  Evaluating the Lifespan of Code Smells using Software Repository Mining , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[53]  Gabriele Bavota,et al.  Mining Version Histories for Detecting Code Smells , 2015, IEEE Transactions on Software Engineering.

[54]  Eduardo Figueiredo,et al.  Understanding the longevity of code smells: preliminary results of an explanatory survey , 2011, WRT '11.

[55]  Yann-Gaël Guéhéneuc,et al.  SMURF: A SVM-based Incremental Anti-pattern Detection Approach , 2012, 2012 19th Working Conference on Reverse Engineering.

[56]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[57]  M. Mäntylä,et al.  Subjective evaluation of software evolvability using code smells: An empirical study , 2006, Empirical Software Engineering.

[58]  Foutse Khomh,et al.  BDTEX: A GQM-based Bayesian approach for the detection of antipatterns , 2011, J. Syst. Softw..

[59]  Andrea De Lucia,et al.  Automatic test case generation: what if test code quality matters? , 2016, ISSTA.

[60]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[61]  Ward Cunningham,et al.  The WyCash portfolio management system , 1992, OOPSLA '92.

[62]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[63]  Andrea De Lucia,et al.  [Journal First] The Scent of a Smell: An Extensive Comparison Between Textual and Structural Smells , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[64]  Francesca Arcelli Fontana,et al.  Automatic detection of bad smells in code: An experimental assessment , 2012, J. Object Technol..

[65]  Stéphane Ducasse,et al.  Using history information to improve design flaws detection , 2004, Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings..

[66]  Jochen Kreimer,et al.  Adaptive Detection of Design Flaws , 2005, LDTA@ETAPS.

[67]  Gabriele Bavota,et al.  Anti-Pattern Detection: Methods, Challenges, and Open Issues , 2015, Adv. Comput..

[68]  Foutse Khomh,et al.  Numerical Signatures of Antipatterns: An Approach Based on B-Splines , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[69]  Gabriele Bavota,et al.  An experimental investigation on the innate relationship between quality and refactoring , 2015, J. Syst. Softw..

[70]  Alexander Serebrenik,et al.  On negative results when using sentiment analysis tools for software engineering research , 2017, Empirical Software Engineering.

[71]  Kalyanmoy Deb,et al.  Code-Smell Detection as a Bilevel Problem , 2014, TSEM.

[72]  Baldoino Fonseca dos Santos Neto,et al.  Experience report: Evaluating the effectiveness of decision trees for detecting code smells , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[73]  Yann-Gaël Guéhéneuc,et al.  Support vector machines for anti-pattern detection , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[74]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[75]  Koichiro Ochimizu,et al.  Towards logistic regression models for predicting fault-prone code across software projects , 2009, ESEM 2009.

[76]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .

[77]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[78]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[79]  Alexander Chatzigeorgiou,et al.  Identification of Move Method Refactoring Opportunities , 2009, IEEE Transactions on Software Engineering.

[80]  Gabriele Bavota,et al.  When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away) , 2015, IEEE Transactions on Software Engineering.

[81]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[82]  C.J.H. Mann,et al.  Object-Oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems , 2007 .

[83]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[84]  Yann-Gaël Guéhéneuc,et al.  DECOR: A Method for the Specification and Detection of Code and Design Smells , 2010, IEEE Transactions on Software Engineering.

[85]  Foutse Khomh,et al.  An Empirical Study of the Impact of Two Antipatterns, Blob and Spaghetti Code, on Program Comprehension , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[86]  Andy Zaidman,et al.  Does Refactoring of Test Smells Induce Fixing Flaky Tests? , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[87]  Shinji Kusumoto,et al.  Classification model for code clones based on machine learning , 2015, Empirical Software Engineering.

[88]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[89]  Mika Mäntylä,et al.  Are test cases needed? Replicated comparison between exploratory and test-case-based software testing , 2014, Empirical Software Engineering.