A large vocabulary system for Arabic online handwriting recognition

The success of using Hidden Markov Models (HMMs) for speech recognition application has motivated the adoption of these models for handwriting recognition especially the online handwriting that has large similarity with the speech signal as a sequential process. Some languages such as Arabic, Farsi and Urdo include large number of delayed strokes that are written above or below most letters and usually written delayed in time. These delayed strokes represent a modeling challenge for the conventional left-right HMM that is commonly used for Automatic Speech Recognition (ASR) systems. In this paper, we introduce a new approach for handling delayed strokes in Arabic online handwriting recognition using HMMs. We also show that several modeling approaches such as context based tri-grapheme models, speaker adaptive training and discriminative training that are currently used in most state-of-the-art ASR systems can provide similar performance improvement for Hand Writing Recognition (HWR) systems. Finally, we show that using a multi-pass decoder that use the computationally less expensive models in the early passes can provide an Arabic large vocabulary HWR system with practical decoding time. We evaluated the performance of our proposed Arabic HWR system using two databases of small and large lexicons. For the small lexicon data set, our system achieved competing results compared to the best reported state-of-the-art Arabic HWR systems. For the large lexicon, our system achieved promising results (accuracy and time) for a vocabulary size of 64k words with the possibility of adapting the models for specific writers to get even better results.

[1]  Nizar Habash,et al.  Online Arabic Handwriting Recognition Using Hidden Markov Models , 2006 .

[2]  Jianying Hu,et al.  Writer independent on-line handwriting recognition using an HMM approach , 2000, Pattern Recognit..

[3]  Ahmad T. Al-Taani An Efficient Feature Extraction Algorithm for the Recognition of Handwritten Arabic Digits , 2008 .

[4]  Massimo Tistarelli,et al.  Advances in Biometrics , 2009, Lecture Notes in Computer Science.

[5]  Sherif Abdou,et al.  AltecOnDB: A Large-Vocabulary Arabic Online Handwriting Recognition Database , 2014, ArXiv.

[6]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  M. Tahar Kechadi,et al.  Preprocessing Techniques for Online Handwriting Recognition , 2009, Intelligent Text Categorization and Clustering.

[8]  Amin A. Shoukry,et al.  On-line recognition of handwritten isolated arabic characters , 1989, Pattern Recognit..

[9]  E Kabir,et al.  A SIMPLE METHOD FOR ONLINE RECOGNITION OF FARSI SUBWORDS , 2005 .

[10]  Richard M. Schwartz,et al.  On-line cursive handwriting recognition using speech recognition methods , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Ehsanollah Kabir,et al.  Lexicon Reduction Using Delayed Strokes for the Recognition of Online Farsi Subwords , 2011 .

[12]  Alexander H. Waibel,et al.  Online handwriting recognition: the NPen++ recognizer , 2001, International Journal on Document Analysis and Recognition.

[13]  Mark J. F. Gales Adaptive training for robust ASR , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[14]  Deyu Zhou,et al.  Discriminative Training of the Hidden Vector State Model for Semantic Parsing , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Sherif Abdelazeem,et al.  On-line Arabic Handwritten Personal Names Recognition System Based on HMM , 2011, 2011 International Conference on Document Analysis and Recognition.

[16]  Khaled Daifallah,et al.  Recognition-Based Segmentation Algorithm for On-Line Arabic Handwriting , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[17]  Mohammad S. Khorsheed,et al.  Recognising handwritten Arabic manuscripts using a single hidden Markov model , 2003, Pattern Recognit. Lett..

[18]  Hussein Almuallim,et al.  A Method of Recognition of Arabic Cursive Handwriting , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Rabab Kreidieh Ward,et al.  A novel invariant mapping applied to hand-written arabic character recognition , 2001, Pattern Recognit..

[20]  Ehsanollah Kabir,et al.  Effect of delayed strokes on the recognition of online Farsi handwriting , 2013, Pattern Recognit. Lett..

[21]  Samia A. Mashali,et al.  Simultaneous Segmentation and Recognition of Arabic Characters in an Unconstrained On-Line Cursive Handwritten Document , 2007 .

[22]  Stan Z. Li,et al.  Advances in Biometrics, International Conference, ICB 2007, Seoul, Korea, August 27-29, 2007, Proceedings , 2007, ICB.

[23]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[24]  Habibollah Haron,et al.  The evolution and trend of chain code scheme , 2008 .

[25]  Jin-Young Ha,et al.  Unconstrained handwritten word recognition with interconnected hidden markov models = 상호 연결된 은닉 마르코프 모델을 이용한 무제약 필기 단어 인식 , 1994 .

[26]  Volker Märgner,et al.  On-line Arabic handwriting recognition competition , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[27]  Volker Märgner,et al.  On-line Arabic handwriting recognition competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[28]  Steve Young,et al.  The HTK book , 1995 .