Automatic recognition of common Arabic handwritten words based on OCR and N-GRAMS

Comprehensive databases are vital for training and validation of word recognition systems. To overcome the lack of offline databases of Arabic handwritten words, especially regarding the generality of the underlying vocabulary, we used a synthesis system to generate a database of common Arabic handwritings. Subsequently, we validate a new word recognition system on these synthetic handwritings, to analyze the performance of its segmentation, character recognition, and error correction module. We found, that a dynamic character classifier, that is capable to adapted to the variations that are caused by the segmentation, clearly improves word recognition accuracy. For error detection and correction, n-grams as well as the Levenstein distance to a vocabulary of up to 50,000 valid words have been used.

[1]  Zaher Al Aghbari,et al.  IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[2]  Ayoub Al-Hamadi,et al.  An Active Shape Model based approach for Arabic handwritten character recognition , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[3]  Wasfi G. Al-Khatib,et al.  An Arabic handwriting synthesis system , 2015, Pattern Recognit..

[4]  Lara del Val,et al.  Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines , 2015, Sensors.

[5]  Ayoub Al-Hamadi,et al.  An Approach for Arabic Handwriting Synthesis Based on Active Shape Models , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[6]  L. Chergui,et al.  New hybrid Arabic handwriting recognizer , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[7]  Najoua Essoukri Ben Amara,et al.  Arabic Handwriting Recognition Based on Synchronous Multi-stream HMM Without Explicit Segmentation , 2015, HAIS.

[8]  Maâmar Kef,et al.  SIFT descriptors for Arabic handwriting recognition , 2015, Int. J. Comput. Vis. Robotics.

[9]  Jihad El-Sana,et al.  Comprehensive synthetic Arabic database for on/off-line script recognition research , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[10]  Asoke K. Nandi,et al.  Simplifying hand written digit recognition using a genetic algorithm , 2002, 2002 11th European Signal Processing Conference.

[11]  Ahmed Ghoneim,et al.  ASM Based Synthesis of Handwritten Arabic Text Pages , 2015, TheScientificWorldJournal.