Headline Based Text Extraction from Outdoor Images

The goal of this article is to design an effective scheme for extraction of Bangla/Devnagari text from outdoor images. We first segment a color image using fuzzy c-means algorithm. In Bangla/Devnagari script, text may be attached/unattached to the headlines. Hence, after segmentation, headlines are detected from each connected components using morphology. Now, the components attached or close to the detected headlines are separated. Further by applying certain shape and position based purification we could distinguish text and non text. Our experiments on a dataset of 100 outdoor images containing Bangla and/or Devnagari text reveals satisfactory performance.

[1]  JungHyun Han,et al.  Text scanner with text detection technology on image sequences , 2002, Object recognition supported by user interaction for service robots.

[2]  Ujjwal Bhattacharya,et al.  Devanagari and Bangla Text Extraction from Natural Scene Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[3]  David S. Doermann,et al.  Camera-based analysis of text and documents: a survey , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[4]  Utpal Roy,et al.  A Color Based Image Segmentation and its Application to Text Segmentation , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[5]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..