Hermetic and Web Plagiarism Detection Systems for Student Essays—An Evaluation of the State-of-the-Art

Plagiarism has become a serious problem in education, and several plagiarism detection systems have been developed for dealing with this problem. This study provides an empirical evaluation of eight plagiarism detection systems for student essays. We present a categorical hierarchy of the most common types of plagiarism that are encountered in student texts. Our purpose-built test set contains texts in which instances of several commonly utilized plagiaristic techniques have been embedded. While Sherlock was clearly the overall best hermetic detection system, SafeAssignment performed best in detecting web plagiarism. TurnitIn was found to be the most advanced system for detecting semi-automatic forms of plagiarism such as the substitution of Cyrillic equivalents for certain characters or the insertion of fake whitespaces. The survey indicates that none of the systems are capable of reliably detecting plagiarism from both local and Internet sources while at the same time being able to identify the technical tricks that plagiarizers use to conceal plagiarism.

[1]  Sami Surakka,et al.  Plaggie: GNU-licensed source code plagiarism detection engine for Java exercises , 2006, Baltic Sea '06.

[2]  Michael Luck,et al.  Plagiarism in programming assignments , 1999 .

[3]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[4]  S. K. Robinson,et al.  An empirical approach for detecting program similarity and plagiarism within a university programming environment , 1987 .

[5]  Boris Katz,et al.  Using Syntactic Information to Identify Plagiarism , 2005 .

[6]  Erkki Sutinen,et al.  Fast Plagiarism Detection System , 2005, SPIRE.

[7]  J. Welsh,et al.  The Little Book of Plagiarism , 2008 .

[8]  Richard G. Harris,et al.  Anti-Plagiarism Strategies for Research Papers , 2002 .

[9]  K. J. Ottenstein An algorithmic approach to the detection and prevention of plagiarism , 1976, SGCS.

[10]  Thomas Lancaster Using freely available tools to produce a partially automated plagiarism detection process , 2004 .

[11]  Nathan Griffiths,et al.  Evaluation of the BOSS Online Submission and Assessment System , 2005 .

[12]  A. Lathrop,et al.  Student Cheating and Plagiarism in the Internet Era: A Wake-Up Call , 2000 .

[13]  Erkki Sutinen,et al.  Using natural language parsers in plagiarism detection , 2007, SLaTE.

[14]  Michael J. Wise,et al.  Plagiarism à la Mode: A Comparison of Automated Systems for Detecting Suspected Plagiarism , 1996, Comput. J..

[15]  Justin Zobel,et al.  Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[16]  Chris J. Park,et al.  In Other (People's) Words: Plagiarism by university students--literature and lessons , 2003 .

[17]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[18]  G. Whale Indentification of Program Similarity in Large Populations , 1990, Comput. J..

[19]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[20]  Hermann A. Maurer,et al.  Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[21]  Paul Clough,et al.  Plagiarism in natural and programming languages: an overview of current tools and technologies , 2000 .