Scene Text Recognition Using Part-Based Tree-Structured Character Detection

Scene text recognition has inspired great interests from the computer vision community in recent years. In this paper, we propose a novel scene text recognition method using part-based tree-structured character detection. Different from conventional multi-scale sliding window character detection strategy, which does not make use of the character-specific structure information, we use part-based tree-structure to model each type of character so as to detect and recognize the characters at the same time. While for word recognition, we build a Conditional Random Field model on the potential character locations to incorporate the detection scores, spatial constraints and linguistic knowledge into one framework. The final word recognition result is obtained by minimizing the cost function defined on the random field. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms state-of-the-art methods significantly both for character detection and word recognition.

[1]  D. Alspach A gaussian sum approach to the multi-target identification-tracking problem , 1975, Autom..

[2]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[6]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[7]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[8]  Toru Wakahara,et al.  Segmentation and recognition of characters in scene images using selective binarization in color space and GAT correlation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[13]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[15]  Lewis D. Griffin,et al.  Multiscale Histogram of Oriented Gradient Descriptors for Robust Character Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[16]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[17]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[19]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[22]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.