Text Recognition in Natural Images using Multiclass Hough Forests

Text detection and recognition in natural images are popular yet unsolved problems in computer vision. In this paper, we propose a technique that attempts to detect and recognize text in a unified manner by searching for words directly without reducing the image into text regions or individual characters. We present three contributions. First, we modify an object detection framework called Hough Forests (Gall et al., 2011) by introducing "Cross-Scale Binary Features" that compares the information between the same image patch at different scales. We use this modified technique to produce likelihood maps for every text character. Second, our word-formation cost function and computed likelihood maps are used to detect and recognize the text in natural images. We test our technique with the Street View House Numbers (Netzer et al., 2011) and the ICDAR 2003 (Lucas et al., 2003) datasets. For the SVHN dataset, our algorithm outperforms recent methods and has comparable performance using fewer training samples. We also exceed the state-of-the-art word recognition performance for ICDAR 2003 dataset by 4%. Our final contribution is a realistic dataset generation code for text characters.

[1]  Luc Van Gool,et al.  Scalable multi-class object detection , 2011, CVPR 2011.

[2]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[3]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[4]  Lewis D. Griffin,et al.  Multiscale Histogram of Oriented Gradient Descriptors for Robust Character Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[6]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[9]  Nobuo Ezaki,et al.  Text detection from natural scene images: towards a system for visually impaired persons , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[10]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[12]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[14]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .