Enhancing the Mongolian Historical Document Recognition System with Multiple Knowledge-Based Strategies

This paper describes recent work on integrating multiple strategies to improve the performance of the Mongolian historical document recognition system which utilize the segmentation-based scheme. We analyze the reasons why the recognition errors happened. On such basis, we propose three strategies according to the knowledge of the glyph characteristics of Mongolian and integrate them into glyph-unit recognition. The strategies are recognizing the under-segmented and over-segmented fragments RUOF, glyph-unit grouping GG and incorporating the baseline information IBI. The first strategy helps in correcting the segmentation error and the remaining two strategies further improve the classifiers accuracies. The experiment on the historical Mongolian Kanjur demonstrates that utilizing these strategies could effectively increase the accuracy of word recognition.

[1]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[2]  Graham Leedham,et al.  Knowledge-based English cursive script segmentation , 2000, Pattern Recognit. Lett..

[3]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[4]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[5]  Rasmus Berg Palm,et al.  Prediction as a candidate for learning deep hierarchical models of data , 2012 .

[6]  Ho Joon Kim,et al.  Human Action Recognition Using a Modified Convolutional Neural Network , 2007, ISNN.

[7]  Hua Wang,et al.  Multi-font printed Mongolian document recognition system , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[8]  Guanglai Gao,et al.  Character Segmentation for Classical Mongolian Words in Historical Documents , 2014, CCPR.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Guanglai Gao,et al.  Machine-Printed Traditional Mongolian Characters Recognition Using BP Neural Networks , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[11]  Guanglai Gao,et al.  Classical Mongolian Words Recognition in Historical Document , 2011, 2011 International Conference on Document Analysis and Recognition.