Token-based Code Clone Detection Technique in a Student's Programming Exercise

The acts to submit the copied programs of other person make problems in the subject of the programming exercise in university curriculum. Teachers could not make accurate scores and evaluate the reached learning level of students. The code clone detection technique is to automatically detect the copied programs. Researches of the code clone detection technique have been proposed. The object of these researches, however, focused on the source code of industrial field. There are some problems to detect illicit copied codes of reports made by students. In this research, we developed the code clone detection algorithm focusing on the detection of illicit copied codes of submitted reports of students in a programming exercise. Our proposed algorithm is based on the comparison of tokens and can declare the illicit copied codes invalid. The features of illicit copied codes such as swapping the functions and program lines, renaming variable names, changing digits, comments and string constants and changing source codes using formatting tools are detected. We implemented the proposed algorithm and experimented to evaluate our system for the submitted subjects of 119 students. Compared to the human detection for small size of source codes of students in a programming exercise, our system found 32 codes as the illicit copy in 36 illicit copied codes among 14,042 combination detective rules with the threshold which are realized the recall=0.8. The miss detection finding as the copied code was 72 codes with precision=0.302.

[1]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[2]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[3]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[4]  Akito Monden,et al.  Java Birthmarks - Detecting the Software Theft - , 2005, IEICE Trans. Inf. Syst..

[5]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[6]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[7]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[8]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.