Decision Tree Based Recognition of Bangla Text from Outdoor Scene Images

This article proposes a scheme for automatic recognition of Bangla text extracted from outdoor scene images. For extraction, we obtain the headline, then apply certain conditions to distinguish between text and non-text. By removing the headline we partition the text into two zones. We further observe an association among the text symbols in these two different zones. For recognition purpose, we design a decision tree classifier with Multilayer Perceptron (MLP) at leaf nodes. The root node takes into account all possible text symbols. Further nodes highlight distinguishable features and act as two-class classifiers. Finally, at leaf nodes, a few text symbols remain, that are recognized using MLP classifiers. The association between the two zones makes recognition simpler and efficient. The classifiers are trained using about 7100 samples of 52 classes. Experiments are performed on 250 images (200 scene images and 50 scanned images).

[1]  JungHyun Han,et al.  Text scanner with text detection technology on image sequences , 2002, Object recognition supported by user interaction for service robots.

[2]  Utpal Roy,et al.  A Color Based Image Segmentation and its Application to Text Segmentation , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[3]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[4]  Josep Lladós,et al.  A performance evaluation protocol for symbol spotting systems in terms of recognition and location indices , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[5]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[6]  Ujjwal Bhattacharya,et al.  Recognition of Handwritten Bangla Vowel Modifiers , 2006 .

[7]  Anandarup Roy,et al.  SVM-based hierarchical architectures for handwritten Bangla character recognition , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[8]  Ujjwal Bhattacharya,et al.  Devanagari and Bangla Text Extraction from Natural Scene Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[9]  David S. Doermann,et al.  Camera-based analysis of text and documents: a survey , 2005, International Journal of Document Analysis and Recognition (IJDAR).