Development of a Recognizer for Bangla Text: Present Status and Future Challenges

Optical Character Recognition (OCR) System, by virtue of its usefulness, has emerged as a major research area since 1950. Now it is becoming a more challenging issue all over the world to have efficient and more accurate recognizers. There are many widely spoken languages in the world like Chinese, Arabic, Hindi, English, Spanish, Bangla, Russian, Japanese etc. Bangla is one of the most widely spoken languages, ranking 5th in the world. 21st February is observed as the international mother language day to pay homage to the martyrs fought for the establishment of Bangla as the mother tongue of Bangladesh. With the automation everywhere, it is a burning issue to digitize huge, volume of Bangla documents by using an efficient OCR. However as of today there is no such good recognizer available for Bangla compared to other languages. From 80s, it took huge interest and now becomes as a major research area particularly in Bangladesh and India. Lots of works have been done in different sections of pattern recognition tasks (i.e, pre-processing, segmentation, feature extraction, classification) but there is a lack of synchronization between these works. That is why we put our effort into a comprehensive review of the current status of research to develop an all-inclusive Bangla OCR which will enable one to understand the difficulties and challenges involved, to know how much progress has been done and to estimate what more to be done to come out with a successful Bangla OCR.

[1]  Md. Al Mehedi Hasan,et al.  A New Approach to Bangla Text Extraction and Recognition From Textual Image , 2005 .

[2]  Zhang ruilin,et al.  Skew Detection and Correction Method of Fabric Images Based on Hough Transform , 2009, 2009 Second International Conference on Intelligent Computation Technology and Automation.

[3]  Naushad UzZaman,et al.  Analysis of N-Gram based text categorization for Bangla in a newspaper , 2006 .

[4]  Angshul Majumdar,et al.  Bangla Basic Character Recognition Using Digital Curvelet Transform , 2007 .

[5]  Likforman-SulemLaurence,et al.  Text line segmentation of historical documents: a survey , 2007 .

[6]  Mumit Khan,et al.  Rule based segmentation of lower modifiers in complex Bangla scripts , 2009 .

[7]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[8]  Mumit Khan Elimination of Splitting Errors in Printed Bangla Scripts , 2009 .

[9]  Faruq A. Al-Omari,et al.  Handwritten Indian numerals recognition system using probabilistic neural networks , 2004, Adv. Eng. Informatics.

[10]  S. Abirami,et al.  A Survey of Script Identification techniques for Multi-Script Document Images , 2009 .

[11]  Bidyut B. Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2002 .

[12]  Xiaoyan Zhu,et al.  An OCR Post-processing Approach Based on Multi-knowledge , 2005, KES.

[13]  Golam Sarowar,et al.  Enhancing Bengali character recognition process applying heuristics on Neural Network , 2009 .

[14]  M.A. Sattar,et al.  Segmenting bangla text for optical recognition , 2007, 2007 10th international conference on computer and information technology.

[15]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[16]  Hassan Basri,et al.  Real time road sign recognition system using artificial neural networks for bengali textual information box , 2008, 2008 International Symposium on Information Technology.

[17]  Cheng-Lin Liu,et al.  Introduction: Character Recognition, Evolution, and Development , 2007 .

[18]  C. Mello,et al.  A Comparative Study on OCR Tools , 1999 .

[19]  Junichi Kanai,et al.  Character recognition , 1997 .

[20]  Bidyut B. Chaudhuri,et al.  Computer recognition of printed Bangla script , 1995 .

[21]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[22]  R. D. Sudhaker Samuel,et al.  A simple and efficient optical character recognition system for basic symbols in printed Kannada text , 2007 .

[23]  S. M. Murtoza Habib,et al.  A High Performance Domain Specific Ocr For Bangla Script , 2008 .

[24]  Deepak Bagai,et al.  A new algorithm for skew detection and correction , 2004, Pattern Recognit. Lett..

[25]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  M. N. Islam,et al.  INVARIANT BANGLA CHARACTER RECOGNITION USING A PROJECTION-SLICE SYNTHETIC-DISCRIMINANT-FUNCTION-BASED ALGORITHM , 2007 .

[27]  Jalal Mahmud,et al.  A complete OCR system for continuous Bengali characters , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[28]  Khairuddin Omar,et al.  Skew Detection and Correction Technique for Arabic Document Images Based on Centre of Gravity , 2009 .

[29]  Subhadip Basu,et al.  Handwritten Bangla Alphabet Recognition using an MLP Based Classifier , 2012, ArXiv.

[30]  Bidyut Baran Chaudhuri,et al.  Online handwritten Bangla character recognition using HMM , 2008, 2008 19th International Conference on Pattern Recognition.

[31]  Kandarpa Kumar Sarma,et al.  ANN-based Innovative Segmentation Method for Handwritten text in Assamese , 2009, ArXiv.

[32]  Mohammad Badiul Islam,et al.  Bengali handwritten character recognition using modified syntactic method , 2005 .

[33]  Tinku Acharya,et al.  Image Processing: Principles and Applications , 2005, J. Electronic Imaging.

[34]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[35]  Bidyut Baran Chaudhuri,et al.  Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[36]  M.M. Hoque,et al.  Fuzzy Features Extraction from Bangla Handwriten Character , 2007, 2007 International Conference on Information and Communication Technology.

[37]  N. N. R. Ranga Suri,et al.  Preprocessing and Image Enhancement Algorithms for a Form-based Intelligent Character Recognition System , 2005, Int. J. Comput. Sci. Appl..

[38]  Utpal Roy,et al.  A Novel Approach to Skew Detection and Character Segmentation for Handwritten Bangla Words , 2005, Digital Image Computing: Techniques and Applications (DICTA'05).