Using Sentence Similarity Measure for Plagiarism Detection of Arabic Documents

Plagiarism detection it is a challenging task, particularly in natural language texts. Some plagiarism detection tools have been developed for diverse natural languages, especially English. In this paper, we propose, a new plagiarism detection system devoted to Arabic text documents. This system is based on an algorithm that uses a semantic sentence similarity measure. Indeed, the sentence similarity measure aggregates in a linear function between three components: the lexical-based LS including the common words, the semantic-based SS using the synonymy relationships, and the syntactico-semantic- based SSS semantic arguments properties notably semantic argument and thematic role. It measures the semantic similarity between words that play the same syntactic role. Concerning the word-based semantic similarity, an information content-based measure is used to estimate the SS degree between words by exploiting the LMF Arabic standardized dictionary ElMadar. The performance of the proposed system was confirmed through experiments with student thesis reports that promising capabilities in identifying literal and some types of intelligent plagiarism. We also demonstrate its advantages over other plagiarism detection tools, including Aplag.

[1]  Rasim M. Alguliyev,et al.  PDLK: Plagiarism detection using linguistic knowledge , 2015, Expert Syst. Appl..

[2]  Abdelmajid Ben Hamadou,et al.  Using Standardized Lexical Semantic Knowledge to Measure Similarity , 2014, KSEM.

[3]  Paolo Rosso,et al.  Our Method , 1867, Hall's journal of health.

[4]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[5]  Ashraf Elnagar,et al.  A Plagiarism Detection System for Arabic Text-Based Documents , 2012, PAISI.

[6]  Alberto Barrón-Cedeño,et al.  Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection , 2013, CL.

[7]  Mohamed El Bachir Menai,et al.  Detection of Plagiarism in Arabic Documents , 2012 .

[8]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[9]  Felipe Bravo-Marquez,et al.  DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources , 2016, Inf. Fusion.

[10]  Naomie Salim,et al.  Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Abdelmajid Ben Hamadou,et al.  Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences , 2015, ICCCI.

[12]  Paolo Rosso,et al.  A systematic study of knowledge graph analysis for cross-language plagiarism detection , 2016, Inf. Process. Manag..

[13]  Muazzam Ahmed Siddiqui,et al.  A Framework for Plagiarism Detection in Arabic Documents , 2015 .

[14]  Ibrahim Abu El-Khair,et al.  Arabic information retrieval , 2007, Annu. Rev. Inf. Sci. Technol..

[15]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[16]  Abdelmajid Ben Hamadou,et al.  Enhancing the sentence similarity measure by semantic and syntactico-semantic knowledge , 2017, Vietnam Journal of Computer Science.

[17]  Abdelmajid Ben Hamadou,et al.  ISO standard modeling of a large Arabic dictionary , 2015, Natural Language Engineering.