KHATT: A Deep Learning Benchmark on Arabic Script

This work presents state-of-the-art results on one of the complex datasets; known as KHATT. The KHATT dataset shows complex patterns for Arabic handwritten text. We have achieved better performance in terms of Character Recognition by implementing the most successful deep learning approach based on Long Short-Term Memory (LSTM) networks. Connectionist Temporal Classification (CTC) is used as a final layer to align the predicted labels according to the most probable path. The application of MDLSTM scans text-lines in all direction to cover fine inflammation in horizontal and vertical direction. Further, we apply pre-processing on text-lines to prune extra white regions, and de-skew the text lines for accurate height normalization. The deep learning and pre-processing allow us to improve results from 46.13% to 75.8%.

[1]  Cheng Wu,et al.  Semi-Supervised and Unsupervised Extreme Learning Machines , 2014, IEEE Transactions on Cybernetics.

[2]  Mohammed Faouzi Ben Zeghiba Arabic word decomposition techniques for offline Arabic text transcription , 2017, ASAR.

[3]  Yi Yang,et al.  Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Mohammad Alshayeb,et al.  KHATT: Arabic Offline Handwritten Text Database , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[5]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[6]  Andreas K. Maier,et al.  Writer Identification Using GMM Supervectors and Exemplar-SVMs , 2017, Pattern Recognit..

[7]  Hassiba Nemmour,et al.  Histogram of Oriented Gradients for writer's gender, handedness and age prediction , 2015, 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA).

[8]  Sabri A. Mahmoud,et al.  Recognition : A Survey , 2013 .

[9]  Raid Saabni Boosting feature based classifiers for writer identification , 2017, 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR).

[10]  Stephan Vogel,et al.  The QCRI Recognition System for Handwritten Arabic , 2015, ICIAP.

[11]  Xiaohui Zhang,et al.  Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.

[12]  Mohamed Cheriet,et al.  Feature Design for Offline Arabic Handwriting Recognition: Handcrafted vs Automated? , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[13]  Hermann Ney,et al.  Open Vocabulary Arabic Handwriting Recognition Using Morphological Decomposition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14]  Mohammad Alshayeb,et al.  KHATT: An open Arabic offline handwritten text database , 2014, Pattern Recognit..

[15]  Marcus Liwicki,et al.  KPTI: Katib's Pashto Text Imagebase and Deep Learning Benchmark , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[16]  Gernot A. Fink,et al.  Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models , 2016, Pattern Recognit..

[17]  Volker Märgner,et al.  ICFHR2014 Competition on Arabic Writer Identification Using AHTID/MW and KHATT Databases , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[18]  J. Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS 2008.

[19]  Hassiba Nemmour,et al.  Age, gender and handedness prediction from handwriting using gradient features , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[20]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[21]  Marc-Peter Schambach,et al.  Low resolution Arabic recognition with multidimensional recurrent neural networks , 2013, MOCR '13.

[22]  Samee Ullah Khan,et al.  The optical character recognition of Urdu-like cursive scripts , 2014, Pattern Recognit..

[23]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[24]  Haikal El Abed,et al.  Evaluation of Texture Features for Offline Arabic Writer Identification , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[25]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[26]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.