Clone detection in source code by frequent itemset techniques

In this paper we describe a new approach for the detection of clones in source code, which is inspired by the concept of frequent itemsets from data mining. The source code is represented as an abstract syntax tree in XML. Currently, such XML representations exist for instance for Java, C++, or PROLOG. Our approach is very flexible; it can be configured easily to work with multiple programming languages

[1]  Serge Demeyer,et al.  Evaluating clone detection techniques , 2003 .

[2]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[3]  Andrian Marcus,et al.  Source code files as structured documents , 2002, Proceedings 10th International Workshop on Program Comprehension.

[4]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[5]  Jürgen Wolff von Gudenberg,et al.  Comprehending and visualizing software based on XML-representations and call graphs , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[6]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[7]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[8]  Dietmar Seipel,et al.  Analyzing and Visualising Prolog programs based on XML representations , 2003, WLPE.

[9]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.