论文信息 - On Automatic Plagiarism Detection Based on n-Grams Comparison

On Automatic Plagiarism Detection Based on n-Grams Comparison

When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the plagiarised text fragments to their potential source. One of the biggest difficulties in this task is to locate plagiarised fragments that have been modified (by rewording, insertion or deletion, for example) from the source text. The definition of proper text chunks as comparison units of the suspicious and original texts is crucial for the success of this kind of applications. Our experiments with the METER corpus show that the best results are obtained when considering low level word n -grams comparisons (n = {2,3}).

Alberto Barrón-Cedeño | Paolo Rosso

[1] James A. Malcolm,et al. A theoretical basis to the automated detection of copying between texts, and its practical implementation in the Ferret plagiarism and collusion detector , 2004 .

[2] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[3] James A. Malcolm,et al. Detecting Short Passages of Similar Text in Large Document Collections , 2001, EMNLP.

[4] Alexander F. Gelbukh,et al. PPChecker: Plagiarism Pattern Checker in Document Copy Detection , 2006, TSD.

[5] Robert J. Gaizauskas,et al. Building and annotating a corpus for the study of journalistic text reuse , 2002, LREC.