Automatic Clone Recommendation for Refactoring Based on the Present and the Past

When many clones are detected in software programs, not all clones are equally important to developers. To help developers refactor code and improve software quality, various tools were built to recommend clone-removal refactorings based on the past and the present information, such as the cohesion degree of individual clones or the co-evolution relations of clone peers. The existence of these tools inspired us to build an approach that considers as many factors as possible to more accurately recommend clones. This paper introduces CREC, a learning-based approach that recommends clones by extracting features from the current status and past history of software projects. Given a set of software repositories, CREC first automatically extracts the clone groups historically refactored (R-clones) and those not refactored (NR-clones) to construct the training set. CREC extracts 34 features to characterize the content and evolution behaviors of individual clones, as well as the spatial, syntactical, and co-change relations of clone peers. With these features, CREC trains a classifier that recommends clones for refactoring. We designed the largest feature set thus far for clone recommendation, and performed an evaluation on six large projects. The results show that our approach suggested refactorings with 83% and 76% F-scores in the within-project and cross-project settings. CREC significantly outperforms a state-of-the-art similar approach on our data set, with the latter one achieving 70% and 50% F-scores. We also compared the effectiveness of different factors and different learning algorithms.

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[5]  Magdalena Balazinska,et al.  Partial redesign of Java software systems based on clone analysis , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[6]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[7]  Magdalena Balazinska,et al.  Advanced clone-analysis to support object-oriented system refactoring , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[8]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[9]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[10]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[11]  Shinji Kusumoto,et al.  Refactoring Support Based on Code Clone Analysis , 2004, PROFES.

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[14]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[15]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[16]  Zhendong Su,et al.  Context-based detection of clone-related bugs , 2007, ESEC-FSE '07.

[17]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[18]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[19]  B. Hirsbrunner,et al.  Toward an Implementation of the "Form Template Method" Refactoring , 2007, Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007).

[20]  Shinji Kusumoto,et al.  A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system , 2008, J. Softw. Maintenance Res. Pract..

[21]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[22]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[23]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[24]  Alexander Chatzigeorgiou,et al.  Ranking Refactoring Suggestions Based on Historical Volatility , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[25]  Rainer Koschke,et al.  Frequency and risks of changes to clones , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[26]  Miryung Kim,et al.  An Empirical Study of Long-Lived Code Clones , 2011, FASE.

[27]  Jeffrey G. Gray,et al.  Increasing clone maintenance support by unifying clone detection and refactoring activities , 2012, Inf. Softw. Technol..

[28]  Miryung Kim,et al.  An empirical study of supplementary bug fixes , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[29]  Shinji Kusumoto,et al.  Identifying, Tailoring, and Suggesting Form Template Method Refactoring Opportunities with Program Dependence Graph , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[30]  Miryung Kim,et al.  Detecting and characterizing semantic inconsistencies in ported code , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[31]  Shinji Kusumoto,et al.  Identifying clone removal opportunities based on co-evolution analysis , 2013, IWPSE 2013.

[32]  Nikolaos Tsantalis,et al.  Refactoring Clones: An Optimization Problem , 2013, 2013 IEEE International Conference on Software Maintenance.

[33]  Xiaohong Su,et al.  SPAPE: A semantic-preserving amorphous procedure extraction method for near-miss clones , 2013, J. Syst. Softw..

[34]  Rainer Koschke,et al.  An Empirical Study of Clone Removals , 2013, 2013 IEEE International Conference on Software Maintenance.

[35]  Katsuro Inoue,et al.  How to extract differences from similar programs? A cohesion metric approach , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[36]  Miryung Kim,et al.  An Empirical Study of RefactoringChallenges and Benefits at Microsoft , 2014, IEEE Transactions on Software Engineering.

[37]  Jun Sun,et al.  Detecting differences across multiple instances of code clones , 2014, ICSE.

[38]  Dongmei Zhang,et al.  Predicting Consistency-Maintenance Requirement of Code Clonesat Copy-and-Paste Time , 2014, IEEE Transactions on Software Engineering.

[39]  Manishankar Mondal,et al.  Automatic ranking of clones for refactoring through mining association rules , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[40]  Michael W. Godfrey,et al.  Recommending Clones for Refactoring Using Design, Context, and History , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[41]  Miryung Kim,et al.  Does Automated Refactoring Obviate Systematic Editing? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[42]  Davood Mazinanian,et al.  Assessing the Refactorability of Software Clones , 2015, IEEE Transactions on Software Engineering.

[43]  Danny Dig,et al.  API code recommendation using statistical learning from fine-grained changes , 2016, SIGSOFT FSE.

[44]  Marco Tulio Valente,et al.  Why we refactor? confessions of GitHub contributors , 2016, SIGSOFT FSE.

[45]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[46]  Davood Mazinanian,et al.  Clone Refactoring with Lambda Expressions , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[47]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).