An approach to divide pre-detected Devanagari words from the scene images into characters

A methodology to segment the Devanagari words, extracted from the scene images, into characters is presented. Scene images include street signs, shop names, product advertisements, posters on streets, etc. Such words are prone to multiple sources of noise and these make the segmentation very challenging. The problem gets more complicated while developing the text recognition methodologies for different scripts because there is no general solution to this problem and recognizing text in some scripts can be tougher than in others. An indigenous database is created for this purpose. It consists of 130 samples, manually extracted from 200 natural scene images. The results obtained by applying the proposed techniques are encouraging. The average performance is found to be 55.77 %. The execution time for a typical word of size 1169 × 353 is found to be 4.76 s. The database and the results can serve as baseline for the future researchers.

[1]  Hassan Foroosh,et al.  Single-class SVM for dynamic scene modeling , 2013, Signal Image Video Process..

[2]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[3]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[4]  Xiaoqing Ding,et al.  Handwritten character recognition using gradient feature and quadratic classifier with multiple discrimination schemes , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[5]  Venu Govindaraju,et al.  Challenges in OCR of Devanagari documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  Ujjwal Bhattacharya,et al.  Devanagari and Bangla Text Extraction from Natural Scene Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[7]  Veena Bansal,et al.  Integrating knowledge sources in Devanagari text recognition system , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[8]  Venu Govindaraju,et al.  Design and comparison of segmentation driven and recognition driven Devanagari OCR , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[9]  Madasu Hanmandlu,et al.  Devanagari Character Recognition in the Wild , 2012 .

[10]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[11]  Anandarup Roy,et al.  Headline Based Text Extraction from Outdoor Images , 2011, PReMI.

[12]  Bidyut B. Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2002 .

[13]  D. Manjula,et al.  Statistical modeling for the detection, localization and extraction of text from heterogeneous textual images using combined feature scheme , 2011, Signal Image Video Process..

[14]  Kai Chen,et al.  Text Localization and Recognition in Complex Scenes Using Local Features , 2010, ACCV.

[15]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[16]  David S. Doermann,et al.  Camera-based analysis of text and documents: a survey , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[17]  Veena Bansal,et al.  Segmentation of touching and fused Devanagari characters , 2002, Pattern Recognit..

[18]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[19]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[20]  Guojun Lu,et al.  A Fast Corner Detector Based on the Chord-to-Point Distance Accumulation Technique , 2009, 2009 Digital Image Computing: Techniques and Applications.