Clonepedia: Summarizing Code Clones by Common Syntactic Context for Software Maintenance

Code clones have to be made explicit and be managed in software maintenance. Researchers have developed many clone detection tools to detect and analyze code clones in software systems. These tools report code clones as similar code fragments in source files. However, clone-related maintenance tasks (e.g., refactorings) often involve a group of code clones appearing in larger syntactic context (e.g., code clones in sibling classes or code clones calling similar methods). Given a list of low-level code-fragment clones, developers have to manually summarize from bottom up low-level code clones that are relevant to the syntactic context of a maintenance task. In this paper, we present a clone summarization technique to summarize code clones with respect to their common syntactic context. The clone summarization allows developers to locate and maintain code clones in a top-down manner by type hierarchy and usage dependencies. We have implemented our approach in the Clonepedia tool and conducted a user study on JHotDraw with 16 developers. Our results show that Clonepedia users can better locate and refactor code clones, compared with developers using the Clone Detective tool.

[1]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[2]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[3]  Gail C. Murphy,et al.  Generating natural language summaries for crosscutting source code concerns , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[4]  Jun Sun,et al.  Detecting differences across multiple instances of code clones , 2014, ICSE.

[5]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[6]  Stéphane Ducasse,et al.  Insights into system-wide code duplication , 2004, 11th Working Conference on Reverse Engineering.

[7]  Stan Jarzabek,et al.  Query-based filtering and graphical view generation for clone analysis , 2008, 2008 IEEE International Conference on Software Maintenance.

[8]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[9]  Stan Jarzabek,et al.  A Data Mining Approach for Detecting Higher-Level Clones in Software , 2009, IEEE Transactions on Software Engineering.

[10]  Magdalena Balazinska,et al.  Partial redesign of Java software systems based on clone analysis , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[11]  Stan Jarzabek,et al.  Detecting higher-level similarity patterns in programs , 2005, ESEC/FSE-13.

[12]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[13]  Zhenchang Xing,et al.  Mining Logical Clones in Software: Revealing High-Level Business and Programming Rules , 2013, 2013 IEEE International Conference on Software Maintenance.

[14]  Michael W. Godfrey,et al.  Aiding comprehension of cloning through categorization , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[15]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[16]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[17]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[18]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[19]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[20]  Miryung Kim,et al.  Discovering and representing systematic code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[21]  Elmar Jürgens,et al.  CloneDetective - A workbench for clone detection research , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[22]  Piotr Sankowski,et al.  Maximum weight bipartite matching in matrix multiplication time , 2009, Theor. Comput. Sci..

[23]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[24]  Zhenchang Xing,et al.  Cloning practices: Why developers clone and what can be changed , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[25]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[26]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[27]  Magdalena Balazinska,et al.  Advanced clone-analysis to support object-oriented system refactoring , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[28]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).