tashkeelWAP : A Game With A Purpose For Digitizing Arabic Diacritics

Diacritics in Arabic language are the signs that are found above or under Arabic letters. Their main aim is to provide phonetic aid to readers as well as allowing them to understand the Arabic text in its intended and correct context. The existence of a diacritical mark can entirely change the meaning of Arabic text. Existing Optical Character Recognition (OCR) systems face accuracy difficulties when trying to read Arabic letters with diacritics. This affects the quality of the digitized Arabic text. We introduce “tashkeelWAP”, a web application with two games that allow the digitization of Arabic text by outsourcing it to native Arabic speaking players. As a bi-product of playing the games, we collect possible digitization of Arabic words with diacritics that were not recognized by OCR systems.

[1]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[2]  Otto Chrons,et al.  Digitalkoot: Making Old Archives Accessible Using Crowdsourcing , 2011, Human Computation.

[3]  Haikal El Abed,et al.  Guide to OCR for Arabic Scripts , 2012, Springer London.

[4]  Line Eikvil Optical Character Recognition , 1993 .

[5]  Hend Suliman Al-Khalifa,et al.  Making Arabic PDF books accessible using gamification , 2014, W4A.

[6]  Lin Tingji Jovian,et al.  OCR Correction via Human Computational Game , 2011, 2011 44th Hawaii International Conference on System Sciences.

[7]  Slim Abdennadher,et al.  Collecting Arabic Dialect Variations using Games With A Purpose: A Case Study Targeting the Egyptian Dialect. , 2013 .

[8]  M. H. Ibrahim,et al.  The Arabic Language: Its Role in History , 1972 .

[9]  Hend S. Al-Khalifa,et al.  A System for Sentiment Analysis of Colloquial Arabic Using Human Computation , 2014, TheScientificWorldJournal.

[11]  King Abdulaziz,et al.  AUTOMATIC RESTORATION OF ARABIC DIACRITICS: A SIMPLE, PURELY STATISTICAL APPROACH , 2010 .

[12]  Husni Al-Muhtaseb,et al.  Machine Generation of Arabic Diacritical Marks , 2006, MLMTA.

[13]  François Bry,et al.  Human computation , 2018, it Inf. Technol..

[14]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[15]  Slim Abdennadher,et al.  Kalema: Digitizing Arabic Content for Accessibility Purposes Using Crowdsourcing , 2015, CICLing.

[16]  Jan D. Smeddinck,et al.  Human computation games: A survey , 2011, 2011 19th European Signal Processing Conference.

[17]  Przemyslaw Dymarski,et al.  Hidden Markov Models, Theory and Applications , 2011 .