Fuzzy-Semantic Similarity for Automatic Multilingual Plagiarism Detection

A word may have multiple meanings or senses, it could be modeled by considering that words in a sentence have a fuzzy set that contains words with similar meaning, which make detecting plagiarism a hard task especially when dealing with semantic meaning, and even harder for cross language plagiarism detection. Arabic is known by its richness, word’s constructions and meanings diversity, hence changing texts from/to Arabic is a complex task, and therefore adopting a fuzzy semantic-based approach seems to be the best solution. In this paper, we propose a detailed fuzzy semantic-based similarity model for analyzing and comparing texts in CLP cases, in accordance with the WordNet lexical database, to detect plagiarism in documents translated from/to Arabic, a preprocessing phase is essential to form operable data for the fuzzy process. The proposed method was applied to two texts (Arabic/English), taking into consideration the specificities of the Arabic language. The result shows that the proposed method can detect 85% of the plagiarism cases.

[1]  Naomie Salim,et al.  Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity model , 2015, J. King Saud Univ. Comput. Inf. Sci..

[2]  Gholam Ali Montazer,et al.  A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts , 2015, IWANN.

[3]  Mohammed Erritali,et al.  Semantic Similarity/Relatedness for Cross Language Plagiarism Detection , 2016, 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV).

[4]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[5]  Deepa Gupta,et al.  Using Natural Language Processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[6]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[7]  C. Fellbaum An Electronic Lexical Database , 1998 .

[8]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[9]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[10]  Naomie Salim,et al.  Fuzzy Semantic Plagiarism Detection , 2012, AMLTA.

[11]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[12]  H. Kunkel GENERAL INTRODUCTION , 1971, The Journal of experimental medicine.

[13]  Yiu-Kai Ng,et al.  A Sentence-Based Copy Detection Approach for Web Documents , 2005, FSKD.

[14]  Naomie Salim,et al.  Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection - Lab Report for PAN at CLEF 2010 , 2010, CLEF.