Woodblock-Printing Mongolian Words Recognition by Bi-LSTM with Attention Mechanism

Woodblock-printing Mongolian documents are seriously degraded due to aging. Therefore, it is difficult to segment woodblock-printing Mongolian words are into individual glyphs. In this paper, a holistic recognition approach based on sequence to sequence model has been proposed for the woodblock-printing Mongolian words. The input of the proposed model is the sequence of frames of a wood-block printing Mongolian word. In order to generating the corresponding sequence of frames, each word image should be normalized into the same sizes in advance. And then, each word image is segmented into several fragments with equal size along writing direction. The output of the proposed model is a sequence of letters. To be specific, the proposed model contains three parts: an encoder, a decoder and an attention network. The encoder consists of a deep neural network and a bi-directional Long Short-Term Memory (Bi-LSTM). The decoder consists of a Long Short-Term Memory (LSTM) with a softmax layer. The encoder and decoder are connected by an attention network, which can map multiple frames to one letter. Experimental results demonstrate that the proposed approach outperforms the segmentation based method.

[1]  Guanglai Gao,et al.  A keyword retrieval system for historical Mongolian document images , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[2]  Guanglai Gao,et al.  Enhancing the Mongolian Historical Document Recognition System with Multiple Knowledge-Based Strategies , 2015, ICONIP.

[3]  Hui Zhang,et al.  Segmentation-Free Printed Traditional Mongolian OCR Using Sequence to Sequence with Attention Model , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[4]  Guanglai Gao,et al.  Character Segmentation for Classical Mongolian Words in Historical Documents , 2014, CCPR.

[5]  Lei Xie,et al.  Attention-Based End-to-End Speech Recognition on Voice Search , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Guanglai Gao,et al.  A knowledge-based recognition system for historical Mongolian documents , 2016, International Journal on Document Analysis and Recognition (IJDAR).

[7]  Marcus Liwicki,et al.  KPTI: Katib's Pashto Text Imagebase and Deep Learning Benchmark , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[8]  Guanglai Gao,et al.  Classical Mongolian Words Recognition in Historical Document , 2011, 2011 International Conference on Document Analysis and Recognition.

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[11]  Guanglai Gao,et al.  An efficient binarization method for ancient Mongolian document images , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[12]  Hui Zhang,et al.  Representing word image using visual word embeddings and RNN for keyword spotting on historical document images , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[13]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).