A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection

The task of plagiarism detection entails two main steps, suspicious candidate retrieval and pairwise document similarity analysis also called detailed analysis. In this paper we focus on the second subtask. We will report our monolingual plagiarism detection system which is used to process the Persian plagiarism corpus for the task of pairwise document similarity. To retrieve plagiarised passages a plagiarism detection method based on vector space model, insensitive to context reordering, is presented. We evaluate the performance in terms of precision, recall, granularity and plagdet metrics.

[1]  Benno Stein,et al.  An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[2]  Mark Stevenson,et al.  Developing a corpus of plagiarised short answers , 2011, Lang. Resour. Evaluation.

[3]  Efstathios Stamatatos,et al.  Plagiarism detection using stopword n-grams , 2011, J. Assoc. Inf. Sci. Technol..

[4]  Benno Stein,et al.  A Wikipedia-Based Multilingual Retrieval Model , 2008, ECIR.

[5]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[6]  Azadeh Shakery,et al.  Using a Dictionary and n-gram Alignment to Improve Fine-grained Cross-Language Plagiarism Detection , 2016, DocEng.

[7]  Alberto Barrón-Cedeño,et al.  Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance , 2009, CICLing.

[8]  Matthias Hagen,et al.  Overview of the 1st international competition on plagiarism detection , 2009 .

[9]  Rynson W. H. Lau,et al.  CHECK: a document plagiarism detection system , 1997, SAC '97.

[10]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[11]  Hector Garcia-Molina,et al.  Copy detection mechanisms for digital documents , 1995, SIGMOD '95.

[12]  Hao-Ren Ke,et al.  Plagiarism Detection using ROUGE and WordNet , 2010, ArXiv.

[13]  Juan D. Velásquez,et al.  Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style , 2013, Expert Syst. Appl..

[14]  Benno Stein,et al.  Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling , 2014, CLEF.

[15]  Ion Androutsopoulos,et al.  A Survey of Paraphrasing and Textual Entailment Methods , 2009, J. Artif. Intell. Res..

[16]  Benno Stein,et al.  Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[17]  Alberto Barrón-Cedeño,et al.  Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection , 2013, CL.

[18]  Benno Stein,et al.  Intrinsic Plagiarism Detection , 2006, ECIR.

[19]  Cristian Grozea,et al.  ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection ∗ , 2009 .

[20]  Paolo Rosso,et al.  Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016 , 2016, FIRE.

[21]  Benno Stein,et al.  Ousting ivory tower research: towards a web framework for providing experiments as a service , 2012, SIGIR '12.

[22]  Azadeh Shakery,et al.  Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information , 2016, Inf. Process. Manag..

[23]  Lucia Specia,et al.  Lexical Generalisation for Word-level Matching in Plagiarism Detection , 2011, RANLP.

[24]  Renata de Matos Galante,et al.  A New Approach for Cross-Language Plagiarism Analysis , 2010, CLEF.