A Novel Approach for Detecting Logic Similarity in Plagiarised Source Code

Source code plagiarism is a persistent problem in computer science education. Many tools have been developed to identify indications of source code plagiarism in large data sets. These tools are good at identifying simple cases of plagiarism (e.g. renaming identifiers or shuffling declarations), but they are vulnerable to semantics-preserving obfuscations. This is due to a reliance on analyzing the structure of source code, and not the implemented program logic. In this paper, a novel approach to source code plagiarism detection is proposed that compares two programs for logic similarity. This approach analyses the logic embedded in a program as a means of identifying similarity. The approach is evaluated on a data set of simulated plagiarism. The evaluation demonstrates that the approach is resilient to semantics-preserving transformations.

[1]  S. K. Robinson,et al.  An empirical approach for detecting program similarity and plagiarism within a university programming environment , 1987 .

[2]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[3]  Oscar Karnalim,et al.  Detecting source code plagiarism on introductory programming course assignments using a bytecode approach , 2016, 2016 International Conference on Information & Communication Technology and Systems (ICTS).

[4]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[5]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[6]  Georgina Cosma,et al.  An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis , 2012, IEEE Transactions on Computers.

[7]  David Clark,et al.  A comparison of code similarity analysers , 2018, Empirical Software Engineering.

[8]  Eul Gyu Im,et al.  Software plagiarism detection: a graph-based approach , 2013, CIKM.

[9]  Lutz Prechelt,et al.  JPlag: Finding plagiarisms among a set of programs , 2000 .

[10]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[11]  Mike Joy,et al.  Towards a Definition of Source-Code Plagiarism , 2008, IEEE Transactions on Education.