A hierarchical recursive method for text detection in natural scene images

Text detection in natural scene images is a challenging problem in computer vision. To robust detect various texts in complex scenes, a hierarchical recursive text detection method is proposed in this paper. Usually, texts in natural scenes are not alone and arranged into lines for easy reading. To find all possible text lines in an image, candidate text lines are obtained using text edge box and conventional neural network at first. Then, to accurately find out the true text lines in the image, these candidate text lines are analyzed in a hierarchical recursive architecture. For each of them, connected components segmentation and hierarchical random field based analysis are recursively employed until the detected text line no more changes. Now the detected text lines are output as the text detection result. Experiments on ICDAR 2003 dataset, ICDAR 2013 dataset and Street View Dataset show that the hierarchical recursive architecture can improve text detection performance and the proposed method achieves the state-of-art in scene text detection.

[1]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[2]  Xu-Cheng Yin,et al.  Robust Text Detection in Natural Scene Images. , 2014, IEEE transactions on pattern analysis and machine intelligence.

[3]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Cheng-Lin Liu,et al.  A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.

[5]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[6]  Saturnino Maldonado-Bascón,et al.  SURFing the point clouds: Selective 3D spatial pyramids for category-level object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Pushmeet Kohli,et al.  Associative Hierarchical Random Fields , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[10]  Palaiahnakote Shivakumara,et al.  Detecting text in the real world , 2012, ACM Multimedia.

[11]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[13]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[14]  Yi Yang,et al.  Action recognition by exploring data distribution and feature correlation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[17]  Yonghong Song,et al.  Scene text localization using edge analysis and feature pool , 2016, Neurocomputing.

[18]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[20]  Li Xu,et al.  Automatic character detection and segmentation in natural scene images , 2007 .

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[23]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[24]  Yonghong Song,et al.  Natural Scene Text Detection with Multi-channel Connected Component Segmentation , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[25]  Jie Yuan,et al.  A method for text line detection in natural images , 2013, Multimedia Tools and Applications.

[26]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[27]  Jorge Stolfi,et al.  T-HOG: An effective gradient-based descriptor for single line text regions , 2013, Pattern Recognit..

[28]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Robert Sablatnig,et al.  End-to-End Text Recognition Using Local Ternary Patterns, MSER and Deep Convolutional Nets , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[30]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[31]  Jingmin Xin,et al.  Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis , 2015, Pattern Recognit. Lett..

[32]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[33]  Jiri Matas,et al.  Efficient Scene text localization and recognition with local character refinement , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[34]  Yi Yang,et al.  How Related Exemplars Help Complex Event Detection in Web Videos? , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Jin Hyeong Park,et al.  Performance evaluation of object detection algorithms , 2002, Object recognition supported by user interaction for service robots.

[36]  Wen Gao,et al.  A robust text detection algorithm in images and video frames , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[37]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.