Lexical Generalisation for Word-level Matching in Plagiarism Detection

Plagiarism has always been a concern in many sectors, particularly in education. With the sharp rise in the number of electronic resources available online, an increasing number of plagiarism cases has been observed in recent years. As the amount of source materials is vast, the use of plagiarism detection tools has become the norm to aid the investigation of possible plagiarism cases. This paper describes an approach to improve plagiarism detection by incorporating a lexical generalisation technique. The goal is to identify plagiarised texts even if they are paraphrased using different words. Experiments performed on a subset of the PAN‟10 corpus show that the matching approach involving lexical generalisation yields promising results, as compared to standard n-gram matching

[1]  Diego Antonio Rodríguez Torrejón,et al.  CoReMo System (Contextual Reference Monotony) - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[2]  Zhang Ling,et al.  A Cluster-Based Plagiarism Detection Method - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[3]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[4]  Lucia Specia,et al.  Using Natural Language Processing for Automatic Detection of Plagiarism , 2010 .

[5]  Boris Katz,et al.  Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection , 2005, IJCNLP.

[6]  Mark Stevenson,et al.  Developing a corpus of plagiarised short answers , 2011, Lang. Resour. Evaluation.

[7]  Alberto Barrón-Cedeño,et al.  Towards the 2nd International Competition on Plagiarism Detection and Beyond , 2010 .

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Benno Stein,et al.  An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[10]  Jan Kasprzak,et al.  Improving the Reliability of the Plagiarism Detection System - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[11]  Renata de Matos Galante,et al.  UFRGS@PAN2010: Detecting External Plagiarism - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[12]  Marta R. Costa-jussà,et al.  Plagiarism Detection Using Information Retrieval and Similarity Measures Based on Image Processing Techniques - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[13]  Hao-Ren Ke,et al.  Plagiarism Detection using ROUGE and WordNet , 2010, ArXiv.