AuDeNTES: Automatic Detection of teNtative plagiarism according to a rEference Solution

In academic courses, students frequently take advantage of someone else’s work to improve their own evaluations or grades. This unethical behavior seriously threatens the integrity of the academic system, and teachers invest substantial effort in preventing and recognizing plagiarism. When students take examinations requiring the production of computer programs, plagiarism detection can be semiautomated using analysis techniques such as JPlag and Moss. These techniques are useful but lose effectiveness when the text of the exam suggests some of the elements that should be structurally part of the solution. A loss of effectiveness is caused by the many common parts that are shared between programs due to the suggestions in the text of the exam rather than plagiarism. In this article, we present the AuDeNTES anti-plagiarism technique. AuDeNTES detects plagiarism via the code fragments that better represent the individual students’ contributions by filtering from students’ submissions the parts that might be common to many students due to the suggestions in the text of the exam. The filtered parts are identified by comparing students’ submissions against a reference solution, which is a solution of the exam developed by the teachers. Specifically, AuDeNTES first produces tokenized versions of both the reference solution and the programs that must be analyzed. Then, AuDeNTES removes from the tokenized programs the tokens that are included in the tokenized reference solution. Finally, AuDeNTES computes the similarity among the filtered tokenized programs and produces a ranked list of program pairs suspected of plagiarism. An empirical comparison against multiple state-of-the-art plagiarism detection techniques using several sets of real students’ programs collected in early programming courses demonstrated that AuDeNTES identifies more plagiarism cases than the other techniques at the cost of a small additional inspection effort.

[1]  P. J. Radcliffe,et al.  Plagiarism Prevention Using Automated Tools , 2008 .

[2]  K.W. Bowyer,et al.  Experience using "MOSS" to detect cheating on programming assignments , 1999, FIE'99 Frontiers in Education. 29th Annual Frontiers in Education Conference. Designing the Future of Science and Engineering Education. Conference Proceedings (IEEE Cat. No.99CH37011.

[3]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[4]  Samuel Mann,et al.  Similarity and originality in code: plagiarism and normal variation in student assignments , 2006 .

[5]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[6]  Alla Anohina-Naumeca,et al.  A Conception of a Plagiarism Detection Tool for Processing Template-Based Documents , 2007 .

[7]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[8]  Yo-Ping Huang,et al.  On Students' Strategy-Preferences for Managing Difficult Course Work , 2008, IEEE Transactions on Education.

[9]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[10]  Sami Surakka,et al.  Plaggie: GNU-licensed source code plagiarism detection engine for Java exercises , 2006, Baltic Sea '06.

[11]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[12]  Michelle Craig,et al.  Plagiarism detection using feature-based neural networks , 2007, SIGCSE.

[13]  A W Simon,et al.  HOW TO DO RESEARCH. , 1921, Science.

[14]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[15]  Seyed M. M. Tahaghoghi,et al.  Plagiarism detection across programming languages , 2006, ACSC.

[16]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[17]  Justin Zobel,et al.  Efficient plagiarism detection for large code repositories , 2007 .

[18]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[19]  Lutz Prechelt,et al.  JPlag: Finding plagiarisms among a set of programs , 2000 .

[20]  Michael Luck,et al.  Plagiarism in programming assignments , 1999 .

[21]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.