Predicting Risky Clones Based on Machine Learning

Code clones are similar or identical code fragments to one another in source code. It is said that code clones decrease maintainability of software. On the other hand, all the code clones are not necessarily harmful to software. In this study, we propose a method to identify risky code clones out of all the code clones in source code by using machine learning techniques. Our proposed method learns information about features of code clones which existed in the past and whether they were risky or not. Then, based on these information, we identify risky code clones. As a result of a pilot study, we confirmed that the proposed method was able to predict risky code clones with high accuracy.

[1]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[2]  Lu Zhang,et al.  Can I clone this piece of code here? , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[3]  Shinji Kusumoto,et al.  Classification model for code clones based on machine learning , 2015, Empirical Software Engineering.

[4]  Shinji Kusumoto,et al.  How often do unintended inconsistencies happen? Deriving modification patterns and detecting overlooked code fragments , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).