Hybrid Segmentation Prototype for Arabic Text-Based Documents: Towards Plagiarism Detection

The contribution of this work relates to the field of Arabic text-based document analysis for the detection of plagiarism. This analysis will be carried out according to the triadic computation model of document similarity. The authors propose a hybrid segmentation prototype for Arabic text-based documents that links different processing steps in order to generate the similarity rate between the documents of an Arabic corpus. It involves two segmentation systems and a morphological analysis in order to obtain a matrix representation adapted to the triadic similarity computation according to three abstraction levels: documents, sentences and words.

[1]  Mohamed El Bachir Menai,et al.  Detection of Plagiarism in Arabic Documents , 2012 .

[2]  Naomie Salim,et al.  Web Based Cross Language Plagiarism Detection , 2010, 2010 Second International Conference on Computational Intelligence, Modelling and Simulation.

[3]  Tim Buckwalter Issues in Arabic Morphological Analysis , 2007 .

[4]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[5]  Ashraf Elnagar,et al.  A Plagiarism Detection System for Arabic Text-Based Documents , 2012, PAISI.

[6]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[7]  Efstathios Stamatatos,et al.  Plagiarism detection using stopword n-grams , 2011, J. Assoc. Inf. Sci. Technol..

[8]  Naomie Salim,et al.  Work in Progress: Developing Arabic Plagiarism Detection Tool for E-Learning Systems , 2009, 2009 International Association of Computer Science and Information Technology - Spring Conference.

[9]  Naomie Salim,et al.  Plagiarism detection in arabic scripts using fuzzy information retreival , 2008 .

[10]  Abdulmohsen Al-Thubaity,et al.  Automatic Arabic Text Classification , 2008 .

[11]  Eshetie Berhan,et al.  Text Similarity Based on Data Compression in Arabic , 2014 .

[12]  Dhaou Ghoul Development of resources for training and the use of the tagger TreeTagger on Arabic (Développement de ressources pour l'entrainement et l'utilisation de l'étiqueteur morphosyntaxique TreeTagger sur l'arabe) [in French] , 2013, RÉCITAL.

[13]  Lamia Hadrich Belguith,et al.  Splitting Arabic Texts into Elementary Discourse Units , 2014, TALIP.

[14]  Nikos Loutas,et al.  A Multidisciplinary Survey on Service , 2012, Int. J. Serv. Sci. Manag. Eng. Technol..

[15]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[16]  Benno Stein,et al.  Intrinsic Plagiarism Detection , 2006, ECIR.

[17]  Michael Spann,et al.  Segmentation and Recognition of Printed Arabic Characters , 1995, BMVC.

[18]  Sukhamay Kundu,et al.  Min-transitivity of fuzzy leftness relationship and its application to decision making , 1997, Fuzzy Sets Syst..

[19]  Fathi Debili,et al.  Voyellation automatique de l'arabe , 1998, SEMITIC@COLING.

[20]  Kamel Barkaoui,et al.  Grid-Based Fuzzy Processing for Parallel Learning the Document Similarities , 2014, Int. J. Serv. Sci. Manag. Eng. Technol..

[21]  Yannis Charalabidis,et al.  Measuring Interoperability Readiness in South Eastern Europe and the Mediterranean: The Interoperability Observatory , 2011, Int. J. E Serv. Mob. Appl..