DNN-HMM for Large Vocabulary Mongolian Offline Handwriting Recognition

In this paper, we propose a large vocabulary Mongolian offline handwriting recognition system, using hidden Markov models (HMMs)-deep neural networks (DNN) hybrid architectures which shows superior performance on auto speech recognize (ASR) tasks. We select 50 sub-characters from all shape of Mongolian letters as the smallest modeling unit. First, a set of intensity features are extracted from each of the segmented word, which is based on a sliding window moving across each word image. Then, Multiple contextdependent Gaussian mixture model (GMM)-HMMs are trained by the features. At last a DNN which have 4 hidden layers are trained as a frame classifier, where the class labels are state labels assigned to each input frame through forced alignment using the context-dependent model. In order to validate the proposed model, extensive experiments were carried out using the MHW database which contains 100,000 handwritten words in training set, 5,000 in test set I and 14,085 in Test set II. The DNN-HMM w hich is trained on raw image pixels yields best performance on Test set I with an accuracy of 97.61% and on Test set II with an accuracy of 94.14%.

[1]  Yifan Gong,et al.  Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[2]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[3]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[4]  Guanglai Gao,et al.  Classical Mongolian Words Recognition in Historical Document , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[6]  Salvador España Boquera,et al.  Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hermann Ney,et al.  Tandem HMM with convolutional neural network for handwritten word recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Hua Wang,et al.  Multi-font printed Mongolian document recognition system , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[9]  Jianmin Jiang,et al.  Offline handwritten Arabic cursive text recognition using Hidden Markov Models and re-ranking , 2011, Pattern Recognit. Lett..

[10]  Wei Li,et al.  MULTI-AGENT BASED RECOGNITION SYSTEM OF PRINTED MONGOLIAN CHARACTERS , 2003 .

[11]  Torsten Caesar,et al.  Sophisticated topology of hidden Markov models for cursive script recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Hermann Ney,et al.  The RWTH Large Vocabulary Arabic Handwriting Recognition System , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[13]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[14]  Yajie Miao,et al.  Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN , 2014, ArXiv.

[15]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[16]  Hermann Ney,et al.  Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[17]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[18]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Chafic Mokbel,et al.  Arabic handwriting recognition using baseline dependant features and hidden Markov modeling , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[20]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .