Identifying Utility Functions Using Random Forests

Utility functions are general purpose functions, which are useful in many parts of a system. To facilitate reuse, they are usually implemented in specific libraries. However, developers frequently miss opportunities to implement general-purpose functions in utility libraries, which decreases the chances of reuse. In this paper, we describe our ongoing investigation on using Random Forest classifiers to automatically identify utility functions. Using a list of static source code metrics we train a classifier to identify such functions, both in Java (using 84 projects from the Qualitas Corpus) and in JavaScript (using 22 popular projects from GitHub). We achieve the following median results for Java: 0.90 (AUC), 0.83 (precision), 0.88 (recall), and 0.84 (F-measure). For JavaScript, the median results are 0.80 (AUC), 0.75 (precision), 0.89 (recall), and 0.76 (F-measure).

[1]  Ricardo Terra,et al.  Qualitas.class corpus: a compiled version of the qualitas corpus , 2013, SOEN.

[2]  Marco Tulio Valente,et al.  Does JavaScript software embrace classes? , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[3]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[4]  K. Rangarajan,et al.  Modularization of a Large-Scale Business Application: A Case Study , 2009, IEEE Software.

[5]  Gabriele Bavota,et al.  Methodbook: Recommending Move Method Refactorings via Relational Topic Models , 2014, IEEE Transactions on Software Engineering.

[6]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[7]  Ricardo Terra,et al.  Recommending Move Method refactorings using dependency sets , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[8]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[9]  Tim Menzies,et al.  Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[10]  Alexander Chatzigeorgiou,et al.  Identification of Move Method Refactoring Opportunities , 2009, IEEE Transactions on Software Engineering.

[11]  Georgios Gousios,et al.  Untangling fine-grained code changes , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  David Lo,et al.  What are the characteristics of high-rated apps? A case study on free Android Applications , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[14]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[15]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[16]  Krzysztof Czarnecki,et al.  Recommending Refactorings to Reverse Software Architecture Erosion , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[17]  Krzysztof Czarnecki,et al.  A recommendation system for repairing violations detected by static architecture conformance checking , 2015, Softw. Pract. Exp..

[18]  Bart Demoen,et al.  Improving Prolog programs: Refactoring for Prolog , 2004, Theory and Practice of Logic Programming.

[19]  Yann-Gaël Guéhéneuc,et al.  Can Lexicon Bad Smells Improve Fault Prediction? , 2012, 2012 19th Working Conference on Reverse Engineering.

[20]  Uirá Kulesza,et al.  An Empirical Study of Delays in the Integration of Addressed Issues , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.