Using Slicing to Identify Duplication in Source Code

Programs often have a lot of duplicated code, which makes both understanding and maintenance more difficult. This problem can be alleviated by detecting duplicated code, extracting it into a separate new procedure, and replacing all the clones (the instances of the duplicated code) by calls to the new procedure. This paper describes the design and initial implementation of a tool that finds clones and displays them to the programmer. The novel aspect of our approach is the use of program dependence graphs (PDGs) and program slicing to find isomorphic PDG subgraphs that represent clones. The key benefits of this approach are that our tool can find non-contiguous clones (clones whose components do not occur as contiguous text in the program), clones in which matching statements have been reordered, and clones that are intertwined with each other. Furthermore, the clones that are found are likely to be meaningful computations, and thus good candidates for extraction.

[1]  Harry G. Barrow,et al.  Subgraph Isomorphism, Matching Relational Structures and Maximal Cliques , 1976, Inf. Process. Lett..

[2]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[3]  Ting Wang,et al.  EMCSS: A New Method for Maximal Common Substructure Search , 1997, J. Chem. Inf. Comput. Sci..

[4]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[5]  William G. Griswold,et al.  Supporting the restructuring of data abstractions through manipulation of a program visualization , 1998, TSEM.

[6]  J. J. McGregor,et al.  Backtrack search algorithms and the maximal common subgraph problem , 1982, Softw. Pract. Exp..

[7]  A. Peter Johnson,et al.  An algorithm for the multiple common subgraph problem , 1992, Journal of chemical information and computer sciences.

[8]  Karl J. Ottenstein,et al.  The program dependence graph in a software development environment , 1984, SDE 1.

[9]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[10]  Susan Horwitz,et al.  Semantics-preserving procedure extraction , 2000, POPL '00.

[11]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[12]  Brenda S. Baker,et al.  Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance , 1997, SIAM J. Comput..

[13]  Glenford J. Myers,et al.  Structured Design , 1999, IBM Syst. J..

[14]  Neil Davey,et al.  The development of a software clone detector , 1995 .

[15]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[16]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.