Detection of Plagiarism in Arabic Documents

Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Language- independent tools exist as well, but are considered restrictive as they usually do not take into account specific language features. Detecting plagiarism in Arabic documents is particularly a challenging task because of the complex linguistic structure of Arabic. In this paper, we present a plagiarism detection tool for comparison of Arabic documents to identify potential similarities. The tool is based on a new comparison algorithm that uses heuristics to compare suspect documents at different hierarchical levels to avoid unnecessary comparisons. We evaluate its performance in terms of precision and recall on a large data set of Arabic documents, and show its capability in identifying direct and sophisticated copying, such as sentence reordering and synonym substitution. We also demonstrate its advantages over other plagiarism detection tools, including Turnitin, the well-known language-independent tool.

[1]  Erkki Sutinen,et al.  Using natural language parsers in plagiarism detection , 2007, SLaTE.

[2]  Hector Garcia-Molina,et al.  SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.

[3]  Fintan Culwin,et al.  Classifications of plagiarism detection engines , 2005 .

[4]  Christiane Fellbaum,et al.  Introducing the Arabic WordNet project , 2006 .

[5]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[6]  Justin Zobel,et al.  Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[7]  Benno Stein,et al.  Plagiarism Detection Without Reference Collections , 2006, GfKl.

[8]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[9]  Máté Pataki Plagiarism Detection and Document Chunking Methods , 2003, WWW.

[10]  S. Dumais Latent Semantic Analysis. , 2005 .

[11]  Eric Atwell,et al.  Comparative Evaluation of Arabic Language Morphological Analysers and Stemmers , 2008, COLING.

[12]  Tim Buckwalter Issues in Arabic Orthography and Morphology Analysis , 2004 .

[13]  Rynson W. H. Lau,et al.  CHECK: a document plagiarism detection system , 1997, SAC '97.

[14]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[15]  Hermann A. Maurer,et al.  Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[16]  Mohamed El Bachir Menai,et al.  Similarity detection in Java programming assignments , 2010, 2010 5th International Conference on Computer Science & Education.

[17]  R. Al Shalabi,et al.  New approach for extracting Arabic roots , 2003 .

[18]  Janis Grundspenkis,et al.  Computer-based plagiarism detection methods and tools: an overview , 2007, CompSysTech '07.

[19]  Mohamed El Bachir Menai,et al.  APlag: A plagiarism checker for Arabic texts , 2011, 2011 6th International Conference on Computer Science & Education (ICCSE).

[20]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[21]  Stefan Gruner,et al.  Tool support for plagiarism detection in text documents , 2005, SAC '05.