SPPlagiarise: A Tool for Generating Simulated Semantics-Preserving Plagiarism of Java Source Code

Source code plagiarism is a common occurrence in undergraduate computer science education. Studies have indicated at least 50% of students plagiarize during their undergraduate career. To identity cases of source code plagiarism, many source code plagiarism detection tools have been proposed. However, conclusively determining the effectiveness these tools at identifying cases of source code plagiarism is difficult. Evaluations are typically performed using unreleased data sets. Without a comprehensive publicly available data set for source code plagiarism detection evaluation, it is difficult to perform an unbiased and reproducible evaluations of tools. To address this problem, this paper presents a tool, SPPlagiarise, which is designed to produce simulated source code plagiarism of Java source code. SPPlagiarise applies a random number of semantics-preserving source code obfuscations at random locations to a Java code base to simulate source code plagiarism. In this paper the design of the tool and an evaluation of a generated plagiarism data set is presented.

[1]  Chanchal Kumar Roy,et al.  A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[2]  Guy J. Curtis,et al.  An examination of factors related to plagiarism and a five-year follow-up of plagiarism at an Australian university , 2011 .

[3]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[4]  Michael Luck,et al.  Plagiarism in programming assignments , 1999 .

[5]  Branko Kaucic,et al.  Source code plagiarism , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.

[6]  Christian S. Collberg,et al.  Watermarking, Tamper-Proofing, and Obfuscation-Tools for Software Protection , 2002, IEEE Trans. Software Eng..

[7]  Hyoungshick Kim,et al.  COAT: Code Obfuscation Tool to Evaluate the Performance of Code Plagiarism Detection Tools , 2017, 2017 International Conference on Software Security and Assurance (ICSSA).

[8]  Matija Novak,et al.  Review of source-code plagiarism detection in academia , 2016, 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[9]  Chanchal Kumar Roy,et al.  ForkSim: Generating software forks for evaluating cross-project similarity analysis tools , 2013, 2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[10]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[11]  Nicholas Tran,et al.  Sim: a utility for detecting similarity in computer programs , 1999, SIGCSE '99.

[12]  S. K. Robinson,et al.  An empirical approach for detecting program similarity and plagiarism within a university programming environment , 1987 .

[13]  Shelley Yeo,et al.  First‐year university science and engineering students’ understanding of plagiarism , 2007 .