Residual-based approach for authenticating pattern of multi-style diacritical Arabic texts

Arabic script is highly sensitive to changes in meaning with respect to the accurate arrangement of diacritics and other related symbols. The most sensitive Arabic text available online is the Digital Qur’an, the sacred book of Revelation in Islam that all Muslims including non-Arabs recite as part of their worship. Due to the different characteristics of the Arabic letters like diacritics (punctuation symbols), kashida (extended letters) and other symbols, it is written and available in different styles like Kufi, Naskh, Thuluth, Uthmani, etc. As social media has become part of our daily life, posting downloaded Qur’anic verses from the web is common. This leads to the problem of authenticating the selected Qur’anic passages available in different styles. This paper presents a residual approach for authenticating Uthmani and plain Qur’an verses using one common database. Residual (difference) is obtained by analyzing the differences between Uthmani and plain Quranic styles using XOR operation. Based on predefined data, the proposed approach converts Uthmani text into plain text. Furthermore, we propose to use the Tuned BM algorithm (BMT) exact pattern matching algorithm to verify the substituted Uthmani verse with a given database of plain Qur’anic style. Experimental results show that the proposed approach is useful and effective in authenticating multi-style texts of the Qur’an with 87.1% accuracy.

[1]  Abdullah Gani,et al.  Preserving Content Integrity of Digital Holy Quran: Survey and Open Challenges , 2017, IEEE Access.

[2]  Abdulrakeeb M. Al-Ssulami Hybrid string matching algorithm with a pivot , 2015, J. Inf. Sci..

[3]  O. Tayan,et al.  A Hybrid Digital-Signature and Zero-Watermarking Approach for Authentication and Protection of Sensitive Electronic Documents , 2014, TheScientificWorldJournal.

[4]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[5]  Nahla A. Belal,et al.  A Rule-Based Subject-Correlated Arabic Stemmer , 2016 .

[6]  Tony McEnery,et al.  Character Encoding in Corpus Construction , 2005 .

[7]  Muhammad Khurram Khan,et al.  Authentication and Tamper Detection of Digital Holy Quran Images , 2013, 2013 International Symposium on Biometrics and Security Technologies.

[8]  Mohd Shahrizal Sunar,et al.  Quranic verses verification using speech recognition techniques , 2015 .

[9]  Yasser M. Alginahi,et al.  Verification of Qur’anic Quotations Embedded in Online Arabic and Islamic Websites , 2013 .

[10]  ArslanAhmet DeASCIIfication approach to handle diacritics in Turkish information retrieval , 2016 .

[11]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[12]  Thierry Lecroq,et al.  The exact online string matching problem: A review of the most recent results , 2013, CSUR.

[13]  Muhammad Khurram Khan,et al.  Two-Layer Fragile Watermarking Method Secured with Chaotic Map for Authentication of Digital Holy Quran , 2014, TheScientificWorldJournal.

[14]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[15]  Dong Zhou,et al.  Translation techniques in cross-language information retrieval , 2012, CSUR.

[16]  Muhammad Khurram Khan,et al.  Digital Quran Computing: Review, Classification, and Trend Analysis , 2017 .

[17]  Wojciech Plandowski,et al.  Speeding up two string-matching algorithms , 2005, Algorithmica.

[18]  Izzat Alsmadi,et al.  Online integrity and authentication checking for Quran electronic versions , 2017 .

[19]  A. Alshareef,et al.  A Quranic quote verification algorithm for verses authentication , 2012, 2012 International Conference on Innovations in Information Technology (IIT).

[20]  Ali Selamat,et al.  A NOVEL DATASET FOR QURANIC WORDS IDENTIFICATION AND AUTHENTICATION , 2015 .

[21]  Ibrahim Bounhas,et al.  Arabic Cross-Language Information Retrieval , 2016, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[22]  Noorzaily Mohamed Noor,et al.  MFCC-VQ APPROACHFOR QALQALAH TAJWEED RULE CHECKING , 2014 .

[23]  Andrew Hume,et al.  Fast string searching , 1991, USENIX Summer.

[24]  FaroSimone,et al.  The exact online string matching problem , 2013 .

[25]  Edward A. Fox,et al.  Automated arabic text classification with P‐Stemmer, machine learning, and a tailored news article taxonomy , 2016, J. Assoc. Inf. Sci. Technol..

[26]  Tutut Herawan,et al.  A Framework for Authentication of Digital Quran , 2018 .

[27]  Michael Gertz,et al.  Time for More Languages , 2014, ACM Trans. Asian Lang. Inf. Process..

[28]  ElayebBilel,et al.  Arabic Cross-Language Information Retrieval , 2016 .

[29]  Fouzi Harrag,et al.  Experiments in Improvement of Arabic Information Retrieval , 2009 .

[30]  Vahid Rafe,et al.  An Efficient Indexing Approach to Find Quranic Symbols in Large Texts , 2014 .