An Extended Line-Based Approach to Detect Code Clones Using Syntactic and Lexical Information

This paper proposes a new line-based approach for the detection of code clones using syntactic and lexical information. A customized compiler writes a source code representation that contains syntactic and lexical information. A new clone detection tool called LePalex reads the source code representation, and converts it to three types of code: first normal form, second normal form, and third normal form. The first normal form is used to detect the exact match of code clones. The second normal form is used to detect the syntactic match of code clones. The third normal form is used to check for syntactically correct segments of code clones. This paper demonstrates the advantage of this approach in achieving programming language independence using syntactic and lexical information.

[1]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[2]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[3]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[4]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[5]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[6]  Kazuaki Maeda XML-Based Source Code Representation with Parsing Actions , 2007, Software Engineering Research and Practice.

[7]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[8]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).