Selection of an Optimal Set of Features for Bengali Character Recognition

Feature extraction is an essential step of Optical Character Recognition. Accurate and distinguishable feature plays a significant role to leverage the performance of a classifier. The complexity level of feature identification algorithm differs for alphabet sets of different languages. Apart from generic algorithms to find features of different alphabet sets, these algorithms take care of individual characteristic common for a particular alphabet set. Dominant features of one alphabet set might completely differ from that of another set. Since there always remains the chance that inaccurate features may cause inefficient recognition, special attention should be given to identify the set of optimal features of a character set. Bengali characters also have some specific issues apart from the existing issues of other character sets. For example, there are about 300 basic, modified, and compound character shapes in the script, the characters in a word are topologically connected, and Bengali is an inflectional language. Literature survey shows that several authors have used different features and classification algorithms. The authors have extensively reviewed all these feature sets. In order to identify an optimal feature set, variability analysis has been proposed here. They focus on the specific peculiarities of Bengali alphabet sets, its different usage as vowel and consonant signs, compound, complex, and touching characters. The authors also took care to generate easily computable features that take less time for generation. However, more attention needs to be given in order to choose an efficient classifier. Selection of an Optimal Set of Features for Bengali Character Recognition Hasan Sarwar United International University, Bangladesh Mizanur Rahman Institute of Science and Technology (IST), Bangladesh Nasreen Akter St. Francis Xavier University, Canada Saima Hossain LEADS Corporation Limited, Bangladesh Sabrina Ahmed Local Government Engineering Department (LGED), Bangladesh Chowdhury Mofizur Rahman United International University, Bangladesh DOI: 10.4018/978-1-4666-3970-6.ch005

[1]  Subhadip Basu,et al.  Handwritten Bangla Alphabet Recognition using an MLP Based Classifier , 2012, ArXiv.

[2]  Bidyut Baran Chaudhuri,et al.  Online handwritten Bangla character recognition using HMM , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Angshul Majumdar,et al.  Bangla Basic Character Recognition Using Digital Curvelet Transform , 2007 .

[4]  Tai-hoon Kim,et al.  Design of a view based approach for Bengali Character recognition , 2010 .

[5]  Md. Abu Naser Bikas,et al.  A Complete Workflow for Development of Bangla OCR , 2012, ArXiv.

[6]  Nirmalya Chowdhury,et al.  Unsupervised Text Classification Using Kohonen's Self Organizing Network , 2005, CICLing.

[7]  Jalal Mahmud,et al.  A complete OCR system for continuous Bengali characters , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[8]  Bidyut Baran Chaudhuri,et al.  A Hybrid Scheme for Handprinted Numeral Recognition Based on a Self-Organizing Network and MLP Classifiers , 2002, Int. J. Pattern Recognit. Artif. Intell..

[9]  Bidyut Baran Chaudhuri,et al.  Recognition of Handprinted Bangla Numerals Using Neural Network Models , 2002, AFSS.

[10]  Santanu Chaudhury,et al.  Bengali alpha-numeric character recognition using curvature features , 1993, Pattern Recognit..

[11]  Bidyut Baran Chaudhuri,et al.  Automatic Recognition of Unconstrained Off-Line Bangla Handwritten Numerals , 2000, ICMI.

[12]  Bidyut Baran Chaudhuri,et al.  Skew Angle Detection of Digitized Indian Script Documents , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[14]  Subhadip Basu,et al.  Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier , 2010, ArXiv.

[15]  M.M. Hoque,et al.  Fuzzy Features Extraction from Bangla Handwriten Character , 2007, 2007 International Conference on Information and Communication Technology.

[16]  Utpal Roy,et al.  A Novel Approach to Skew Detection and Character Segmentation for Handwritten Bangla Words , 2005, Digital Image Computing: Techniques and Applications (DICTA'05).

[17]  Manoj Kumar Shukla,et al.  Script Segmentation of Printed Devnagari and Bangla Languages Document Images OCR , 2011 .

[18]  Mumit Khan,et al.  Rule based segmentation of lower modifiers in complex Bangla scripts , 2009 .

[19]  Golam Sarowar,et al.  Enhancing Bengali character recognition process applying heuristics on Neural Network , 2009 .

[20]  H. Sarwar,et al.  An Algorithm for Segmenting Modifiers from Bangla Text , 2008, 2008 11th International Conference on Computer and Information Technology.