Text detection and recognition in natural scene images

Text detection and recognition in natural scene images plays an important role in content analysis of images. In this paper, based on the characteristics of scene text, we propose a robust text detection and recognition method using Maximally Stable Extremal Regions (MSER) and Support Vector Machine (SVM). Different from the end to end text recognition, we split the recognition problem into detection and recognition procedure. Firstly, in the detection stage, in order to extract potential text as much as possible, we use MSER and color clustering to extract connected component. Then, for the obtained candidate connected component, we use visual saliency and some prior information to filter non-text regions. Finally, we can obtain word image by text line generation. In the recognition stage, we use vertical projection to segment word images, then recognize character in SVM based framework. The experiment results evaluated on standard dataset show that with a small amount of prior information and simple segment strategy, the proposed method has a better performance compared to conventional text detection and recognition method.

[1]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[2]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[3]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yingli Tian,et al.  Assistive Text Reading from Complex Background for Blind Persons , 2011, CBDAR.

[5]  Marko Tscherepanow,et al.  A saliency map based on sampling an image into random rectangular regions of interest , 2012, Pattern Recognit..

[6]  S. Lucas,et al.  ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[7]  Stefano Soatto,et al.  Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[8]  Pascale Sébillot,et al.  Combining Multi-scale Character Recognition and Linguistic Knowledge for Natural Scene Text OCR , 2012, Document Analysis Systems.

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Jiri Matas,et al.  On Combining Multiple Segmentations in Scene Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[11]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Cheng-Lin Liu,et al.  A Robust System to Detect and Localize Texts in Natural Scene Images , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[14]  Chen Li,et al.  Texture-Based Text Detection in Digital Images with Wavelet Features and Support Vector Machines , 2013, CORES.

[15]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.