Clone detection in source code by frequent itemset techniques

In this paper we describe a new approach for the detection of clones in source code, which is inspired by the concept of frequent itemsets from data mining. The source code is represented as an abstract syntax tree in XML. Currently, such XML representations exist for instance for Java, C++, or PROLOG. Our approach is very flexible; it can be configured easily to work with multiple programming languages

[1]  Serge Demeyer,et al.  Evaluating clone detection techniques , 2003 .

[2]  Jürgen Wolff von Gudenberg,et al.  Comprehending and visualizing software based on XML-representations and call graphs , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[3]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[4]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[5]  Andrian Marcus,et al.  Source code files as structured documents , 2002, Proceedings 10th International Workshop on Program Comprehension.

[6]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[7]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[8]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[9]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[10]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[11]  Dietmar Seipel,et al.  Analyzing and Visualising Prolog programs based on XML representations , 2003, WLPE.