Classical Mongolian Words Recognition in Historical Document

There are many classical Mongolian historical documents which are reserved in image form, and as a result it is difficult for us to explore and retrieve them. In this paper, we investigate the peculiarities of classical Mongolian documents and propose an approach to recognize the words in them. We design an algorithm to segment the Mongolian words into several Glyph Units(Glyph Unit abbr. GU). Each GU is consisted of no more than three characters. Then we used a three-stage method to recognize the GUs. At the first stage, all the GUs are classified into nine groups by decision tree using three features of the GUs. At the second stage, the GUs in each group are classified individually by five independent BP Neutral Networks whose inputs are other five feature vectors of the GUs. At the last stage, the five results of each GU group from the above five classifiers are combined to provide the final recognized result. The recognition rate of the Mongolian words in our experiment achieves 71%, indicating that our method is effective.

[1]  Chafic Mokbel,et al.  Combining Slanted-Frame Classifiers for Improved HMM-Based Arabic Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Guanglai Gao,et al.  Machine-Printed Traditional Mongolian Characters Recognition Using BP Neural Networks , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[3]  Giuseppe Pirlo,et al.  A Feedback-Based Multi-Classifier System , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[4]  O. Batsamhan,et al.  Mongolian character recognition using multilayer perceptron (MLP) , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[5]  Saeed Mozaffari,et al.  Structural decomposition and statistical description of Farsi/Arabic handwritten numeric characters , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  S. Vitabile,et al.  Handwritten Character Recognition Using a MLP , 1999 .

[7]  Jürgen Beyerer,et al.  Decision tree classifier for character recognition combining support vector machines and artificial neural networks , 2010, Optical Engineering + Applications.

[8]  Hua Wang,et al.  Multi-font printed Mongolian document recognition system , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[9]  Honggang Zhang,et al.  Handwritten Chinese character recognition using Local Discriminant Projection with Prior Information , 2008, 2008 19th International Conference on Pattern Recognition.

[10]  R. Manmatha,et al.  Holistic word recognition for handwritten historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[11]  S. Amirhassan Monadjemi,et al.  Recognition-based Segmentation in Persian Character Recognition , 2008 .

[12]  Habibollah Haron,et al.  Recognition of Isolated Handwritten Latin Characters using One Continuous Route of Freeman Chain Code Representation and Feedforward Neural Network Classifier , 2010 .

[13]  K. Duraiswamy,et al.  Performance Comparison of Different Image Sizes for Recognizing Unconstrained Handwritten Tamil Characters using SVM , 2007 .