Bilingual OCR System for Myanmar and English Scripts with Simultaneous Recognition

Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun AbstractThe increasing amount of development of the digital libraries worldwide raises many new challenges for document image analysis research and development. Storing wide variety of document images in Digital library, for example, for cultural, technical or historical, that are written in many languages, also create many advancement for present day digital image analysis systems. And when the Digital Library is concerned with Science and Technology documents, it needs to advance the OCR system to bilingual nature as most of them are written in Myanmar in combination with English letters. In this paper a bilingual OCR to simultaneously recognize the printed English and Myanmar texts is proposed including segmentation mechanism for the overlapping nature of Myanmar scripts. The effectiveness of the proposed mechanism is proved with the experimental results of segmentation accuracy rates, comparisons of feature extraction methods and overall accuracy rates.

[1]  R. D. Sudhaker Samuel,et al.  Preferred Computational Approaches for the Recognition of different Classes of Printed Malayalam Characters using Hierarchical SVM Classifiers , 2010 .

[2]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[3]  Nadir Durrani,et al.  Survey of Language Computing in Asia 2005 , 2005 .

[4]  Ramadan Elaiess,et al.  General guidelines for designing bilingual low cost digital library services suitable for special library users in developing countries and the Arabic speaking world , 2009 .

[5]  Fang Liu,et al.  Study on Printed Tibetan Character Recognition , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[6]  Jian-xiong Dong,et al.  An improved handwritten Chinese character recognition system using support vector machine , 2005, Pattern Recognit. Lett..

[7]  Nazar Saaid Sarhan,et al.  Recognition of Printed Assyrian Character Based on Neocognitron Artificial Neural Network , 2007, Int. Arab J. Inf. Technol..

[8]  P. Vanaja Ranjan,et al.  EFFICIENT ZONE BASED FEATURE EXTRATION ALGORITHM FOR HANDWRITTEN NUMERAL RECOGNITION OF FOUR POPULAR SOUTH INDIAN SCRIPTS , 2008 .

[9]  Mandeep Kaur,et al.  OCR for Telugu Script Using Back-Propagation Based Classifier , 2010 .

[10]  Hai Guo,et al.  A Chinese Minority Script Recognition Method Based on Wavelet Feature and Modified KNN , 2010, J. Softw..

[11]  David S. Doermann,et al.  Re-targetable OCR with Intelligent Character Segmentation , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[12]  Venu Govindaraju,et al.  Guide to OCR for Indic Scripts , 2010 .

[13]  Venu Govindaraju,et al.  Guide to OCR for Indic Scripts: Document Recognition and Retrieval , 2009 .

[14]  T. Swe,et al.  Recognition and Translation of the Myanmar Printed Text Based on Hopfield Neural Network , 2005, 6th Asia-Pacific Symposium on Information and Telecommunication Technologies.

[15]  Stavros J. Perantonis,et al.  A Complete Optical Character Recognition Methodology for Historical Documents , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[16]  C. V. Jawahar,et al.  Optical Character Recognition of Amharic Documents , 2007, Afr. J. Inf. Commun. Technol..

[17]  G. G. Rajput,et al.  Printed and Handwritten Mixed Kannada Numerals Recognition Using SVM , 2010 .

[18]  R. Ramanathan,et al.  Robust Feature Extraction Technique for Optical Character Recognition , 2009, 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[19]  C. V. Jawahar,et al.  A Semi-automatic Adaptive OCR for Digital Libraries , 2006, Document Analysis Systems.

[20]  Stavros J. Perantonis,et al.  A Novel Feature Extraction and Classification Methodology for the Recognition of Historical Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[21]  Chandra Shekhar Yadav,et al.  Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network , 2010 .

[22]  K. P. Soman,et al.  Multiclass Hierarchical SVM for Recognition of Printed Tamil Characters , 2007 .

[23]  Dinesh U Acharya,et al.  Hierarchical Recognition System for MachinePrinted Kannada Characters , 2008 .