Tweaking Association Rules to Optimize Software Change Recommendations

Past researchs have been trying to recommend artifacts that are likely to change together in a task to assist developers in making changes to a software system, often using techniques like association rules. Association rules learning is a data mining technique that has been frequently used to discover evolutionary couplings. These couplings constitute a fundamental piece of modern change prediction techniques. However, using association rules to detect evolutionary coupling requires a number of configuration parameters, such as measures of interest (e.g. support and confidence), their cut-off values, and the portion of the commit history from which co-change relationships will be extracted. To accomplish this set up, researchers have to carry out empirical studies for each project, testing a few variations of the parameters before choosing a configuration. This makes it difficult to use association rules in practice, since developers would need to perform experiments before applying the technique and would end up choosing non-optimal solutions that lead to wrong predictions. In this paper, we propose a fitness function for a Genetic Algorithm that optimizes the co-change recommendations and evaluate it on five open source projects (CPython, Django, Laravel, Shiny and Gson). The results indicate that our genetic algorithm is able to find optimized cut-off values for support and confidence, as well as to determine which length of commit history yields the best recommendations. We also find that, for projects with less commit history (5k commits), our approach produced better results than the regression function proposed in the literature. This result is particularly encouraging, because repositories such as GitHub host many young projects. Our results can be used by researchers when conducting co-change prediction studies and by tool developers to produce automated support to be used by practitioners.

[1]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[2]  Stanley Phillips Gotshall,et al.  Optimal Population Size and the Genetic Algorithm , 2002 .

[3]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[4]  Andreas Zeller,et al.  The impact of tangled code changes , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[5]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[6]  Luca Scrucca,et al.  GA: A Package for Genetic Algorithms in R , 2013 .

[7]  Marco Aurélio Gerosa,et al.  More Common Than You Think: An In-depth Study of Casual Contributors , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[8]  Dave W. Binkley,et al.  Generalizing the Analysis of Evolutionary Coupling for Software Change Impact Analysis , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[9]  Marco Aurélio Gerosa,et al.  Change Coupling Between Software Artifacts , 2015, The Art and Science of Analyzing Software Data.

[10]  Gerardo Canfora,et al.  Using multivariate time series and association rules to detect logical change coupling: An empirical study , 2010, 2010 IEEE International Conference on Software Maintenance.

[11]  Silvia Regina Vergilio,et al.  A Mutation and Multi-objective Test Data Generation Approach for Feature Testing of Software Product Lines , 2015, 2015 29th Brazilian Symposium on Software Engineering.

[12]  Marco Aurélio Gerosa,et al.  Chapter 11 – Change Coupling Between Software Artifacts: Learning from Past Changes , 2015 .

[13]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[14]  David W. Binkley,et al.  Practical guidelines for change recommendation using association rule mining , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[16]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[17]  A. Tamilarasi,et al.  An Automated Association Rule Mining Technique With Cumulative Support Thresholds , 2009 .

[18]  Kurt Hornik,et al.  The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets , 2011, J. Mach. Learn. Res..

[19]  Ahmed E. Hassan,et al.  Supporting software evolution using adaptive change propagation heuristics , 2008, 2008 IEEE International Conference on Software Maintenance.

[20]  Meir M. Lehman Programs, life cycles, and laws of software evolution , 1980 .

[21]  Mu Zhu,et al.  A Relationship between the Average Precision and the Area Under the ROC Curve , 2015, ICTIR.

[22]  Marco Aurélio Gerosa,et al.  Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects , 2015, CSCW.

[23]  M.M. Lehman,et al.  Programs, life cycles, and laws of software evolution , 1980, Proceedings of the IEEE.

[24]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[25]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[26]  Bogdan Dit,et al.  ImpactMiner: a tool for change impact analysis , 2014, ICSE Companion.

[27]  Richard C. Holt,et al.  Predicting change propagation in software systems , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[28]  Dave W. Binkley,et al.  Exploring the Effects of History Length and Age on Mining Software Change Impact , 2016, 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[29]  Jerffeson Teixeira de Souza,et al.  A New Approach to the Software Release Planning , 2009, 2009 XXIII Brazilian Symposium on Software Engineering.

[30]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[31]  Gabriele Bavota,et al.  An empirical study on the developers' perception of software coupling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[32]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.