Text Localization with Hierarchical Multiple Feature Learning

In this paper, we focus on English text localization in natural scene images. We propose a hierarchical localization framework which goes from characters to strings to words. Different from existing methods which either bet on sophisticated hand-crafted features or rely on heavy learning models, our approach tends to design simple but effective features and learning models. In this study, we introduce a kind of two level character structure features in collaboration with the Histogram of Gradient (HOG) and the Convolutional Neural Network (CNN) features for character localization. In string localization, a nine-dimension string feature is proposed for discriminative verification after grouping characters. For the final word localization, we learn an optimal splitting strategy based on the interval cues to split strings into words. Experiments on the challenging ICDAR benchmark datasets demonstrate the effectiveness and superiority of our approach.

[1]  Cheng-Lin Liu,et al.  Text Localization in Natural Scene Images Based on Conditional Random Field , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  Chunheng Wang,et al.  Scene text detection using graph model built upon maximally stable extremal regions , 2013, Pattern Recognit. Lett..

[3]  Andrew Y. Ng,et al.  Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[4]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[5]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[6]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Qifeng Liu,et al.  Accurate text localization in images based on SVM output scores , 2009, Image Vis. Comput..

[8]  Hyung Il Koo,et al.  Scene Text Detection via Connected Component Clustering and Nontext Filtering , 2013, IEEE Transactions on Image Processing.

[9]  Jiri Matas,et al.  Scene Text Localization and Recognition with Oriented Stroke Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Alan L. Yuille,et al.  A Time-Efficient Cascade for Real-Time Object Detection: With applications for the visually impaired , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[11]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Yingli Tian,et al.  Text extraction from scene images by character appearance and structure modeling , 2013, Comput. Vis. Image Underst..

[13]  Chunheng Wang,et al.  Text detection in images based on unsupervised classification of edge-based features , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[14]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[15]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Chunheng Wang,et al.  Adaptive Scene Text Detection Based on Transferring Adaboost , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[18]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[19]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[20]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[22]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.