Approaches for preserving content integrity of sensitive online Arabic content: A survey and research challenges

Abstract Trends in Internet usage and accessing online content in different languages and formats are proliferating at a considerable speed. There is a vast amount of digital online content available in different formats that are sensitive in nature with respect to writing styles and arrangement of diacritics. However, research done in the area aimed at identifying the necessary techniques suitable for preserving content integrity of sensitive digital online content is limited. So, it is a challenge to determine the techniques most suitable for different formats such as image or binary. Hence, preserving and verifying sensitive content constitutes an emerging problem and calls for timely solutions. The digital Holy Qur'an in Arabic, constitutes, one case of such sensitive content. Due to the different characteristics of the Arabic letters like diacritics (punctuation symbols), kashidas (extended letters) and other symbols, it is very easy to alter the original meaning of the text by simply changing the arrangement of diacritics. This article surveys the different approaches that are presently employed in the process of preserving and verifying the content integrity of sensitive online content. We present the state-of-the-art in content integrity verification and address the existing challenges in preserving the integrity of sensitive texts using the Digital Qur'an as a case study. The proposed taxonomy provides an effective classification and analysis of existing related schemes and their limitations. The paper discusses the recommendations of the expected efficiency of such approaches when applied for use in digital content integrity. Some of the main findings suggest unified approaches of watermarking and string matching approaches can be used to preserve content integrity of any sensitive digital content.

[1]  Mohammad A. AlAhmad,et al.  A New Fragile Digital Watermarking Technique for a PDF Digital Holy Quran , 2013, 2013 International Conference on Advanced Computer Science Applications and Technologies.

[2]  Prabhishek Singh,et al.  A Survey of Digital Watermarking Techniques, Applications and Attacks , 2013 .

[3]  Anthony D. Miyazaki,et al.  Consumer Perceptions of Privacy and Security Risks for Online Shopping , 2001 .

[4]  A. B. M. Shawkat Ali,et al.  Searching quranic verses: A keyword based query solution using .net platform , 2014, The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M).

[5]  Chin-Chen Chang,et al.  Reversible Steganography for VQ-Compressed Images Using Side Matching and Relocation , 2006, IEEE Transactions on Information Forensics and Security.

[6]  Muhammad Khurram Khan,et al.  Authentication and Tamper Detection of Digital Holy Quran Images , 2013, 2013 International Symposium on Biometrics and Security Technologies.

[7]  Hany Farid,et al.  Digital Image Authentication From JPEG Headers , 2011, IEEE Transactions on Information Forensics and Security.

[8]  Ali Selamat,et al.  A Framework for Quranic Verses Authenticity Detection in Online Forum , 2013, 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences.

[9]  Jasni Mohamad Zain,et al.  Localization Watermarking for Authentication of Text Images in Quran with Spiral Manner Numbering , 2013, 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences.

[10]  KhanAsifullah,et al.  Digital image authentication and recovery , 2010 .

[11]  Muhammad Khurram Khan,et al.  Exploiting Digital Watermarking to Preserve Integrity of the Digital Holy Quran Images , 2013, 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences.

[12]  Shiuh-Jeng Wang,et al.  Guest Editorial: Information and Communication Security in Multimedia Applications , 2016, Multimedia Tools and Applications.

[13]  Walter Bender,et al.  Techniques for Data Hiding , 1996, IBM Syst. J..

[14]  Adnan Abdul-Aziz Gutub,et al.  e-Text Watermarking: Utilizing 'Kashida' Extensions in Arabic Language Electronic Writing , 2010 .

[15]  N. P. Singh Online Frauds in Banks with Phishing , 2007 .

[16]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[17]  A. Alshareef,et al.  A Quranic quote verification algorithm for verses authentication , 2012, 2012 International Conference on Innovations in Information Technology (IIT).

[18]  Muhammad Khurram Khan,et al.  DWT+LSB-based fragile watermarking method for digital Quran images , 2014, 2014 International Symposium on Biometrics and Security Technologies (ISBAST).

[19]  Waqar Mahmood,et al.  Internet of multimedia things: Vision and challenges , 2015, Ad Hoc Networks.

[20]  Izzat Alsmadi,et al.  Online integrity and authentication checking for Quran electronic versions , 2017 .

[21]  Adnan Abdul-Aziz Gutub,et al.  Improved Method of Arabic Text Steganography Using the Extension "Kashida" Character , 2010 .

[22]  Mehdi Dadkhah,et al.  An approach for preventing the indexing of hijacked journal articles in scientific databases , 2016, Behav. Inf. Technol..

[23]  Ali Selamat,et al.  Support vector machine based approach for quranic words detection in online textual content , 2014, 2014 8th. Malaysian Software Engineering Conference (MySEC).

[24]  Steven H. Low,et al.  Copyright protection for the electronic distribution of text documents , 1999, Proc. IEEE.

[25]  Noorzaily Mohamed Noor,et al.  Developing the novel Quran and Hadith authentication system , 2014, The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M).

[26]  Mohammad A. AlAhmad,et al.  Protection of the Digital Holy Quran Hash Digest by Using Cryptography Algorithms , 2013, 2013 International Conference on Advanced Computer Science Applications and Technologies.

[27]  Munmun De Choudhury Opportunities and Challenges of Social Media in Personal and Societal Well-being , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[28]  Yasser M. Alginahi,et al.  Verification of Qur’anic Quotations Embedded in Online Arabic and Islamic Websites , 2013 .

[29]  Stephan Katzenbeisser,et al.  Information Hiding Techniques for Steganography and Digital Watermaking , 1999 .

[30]  Qiaozhu Mei,et al.  Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts , 2015, WWW.

[31]  C. P. Sumathi,et al.  A Study of Various Steganographic Techniques Used for Information Hiding , 2014, ArXiv.

[32]  A. Girelli,et al.  Laterally Reversed Fingerprints Detected in Fake Documents , 2015 .

[33]  A AlviSheeraz,et al.  Internet of multimedia things , 2015, AdHocNets 2015.

[34]  Palaiahnakote Shivakumara,et al.  Multi-oriented moving text detection , 2014, 2014 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[35]  Yipeng Zhou,et al.  Analysis and Detection of Fake Views in Online Video Services , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[36]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[37]  Martha Larson,et al.  User Intent in Multimedia Search , 2016, ACM Comput. Surv..

[38]  Belal Abuhaija,et al.  Security protocol architecture for website authentications and content integrity , 2013, 2013 World Congress on Computer and Information Technology (WCCIT).

[39]  Carlos Angel Iglesias,et al.  A Survey of Twitter Rumor Spreading Simulations , 2015, ICCCI.

[40]  Joseph Mathenge Muthoni E-verification-A case of academic testimonials , 2015 .

[41]  W. Shelburne E-book usage in an academic library: User attitudes and behaviors , 2009 .

[42]  M. H. M. Schellekens Electronic Signatures: Authentication Technology from a Legal Perspective , 2004 .

[43]  Vijay Kumar,et al.  A Survey on Various Cryptography Techniques , 2014 .

[44]  Abdullah Gani,et al.  Preserving Content Integrity of Digital Holy Quran: Survey and Open Challenges , 2017, IEEE Access.

[45]  Jens Ohm Transmission and Storage of Multimedia Data , 2015 .

[46]  Muhammad Khurram Khan,et al.  Two-Layer Fragile Watermarking Method Secured with Chaotic Map for Authentication of Digital Holy Quran , 2014, TheScientificWorldJournal.

[47]  O. Tayan,et al.  A Hybrid Digital-Signature and Zero-Watermarking Approach for Authentication and Protection of Sensitive Electronic Documents , 2014, TheScientificWorldJournal.

[48]  L. Dawson,et al.  Religion Online: Finding Faith on the Internet , 2004 .

[49]  Ahmet Arslan,et al.  DeASCIIfication approach to handle diacritics in Turkish information retrieval , 2016, Inf. Process. Manag..

[50]  Asifullah Khan,et al.  Digital image authentication and recovery: Employing integer transform based information embedding and extraction , 2010, Inf. Sci..

[51]  Jung-Yu Lai,et al.  User attitudes toward dedicated e-book readers for reading: the effect of convenience, compatibility, and media richness , 2011, Online Inf. Rev..

[52]  Nashat Mansour,et al.  An auto-indexing method for Arabic text , 2008, Inf. Process. Manag..