Single Shot Text Detector with Rotational Prior Boxes

In this paper, we present an end-to-end trainable single shot text detector (SSTD) which can deal with arbitrary oriented texts in natural images. The SSTD detects texts in a single forward pass without any post-processing except for a simple non-maximum suppression. We propose a rotational prior box (RPB) mechanism to generate inclined proposals with orientation information, which enables the prior box fits the oriented text region better. The network performs prediction on multiple feature maps with different resolutions to handle text of various sizes and directly outputs bounding boxes. We further develop a dimension clustering strategy to select appropriate shapes for the default boxes that fit the text instances better. The proposed method is evaluated on three public datasets, namely ICDAR2015, MSRA-TD500 and ICDAR2013. Experiment results demonstrate its superiority in terms of effectiveness and efficiency over several state-of-the-art approaches.

[1]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[5]  Hojin Cho,et al.  Canny Text Detector: Fast and Robust Scene Text Localization Algorithm , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[7]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[8]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[10]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Shuchang Zhou,et al.  Scene Text Detection via Holistic, Multi-Channel Prediction , 2016, ArXiv.

[12]  Daijin Kim,et al.  Scene text detection with robust character candidate extraction method , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[13]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiang Bai,et al.  Scene text detection and recognition: recent advances and future trends , 2015, Frontiers of Computer Science.

[15]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[16]  Lei Sun,et al.  A robust approach for text detection from natural scene images , 2015, Pattern Recognit..

[17]  Yi Li,et al.  Orientation Robust Text Line Detection in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[19]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[20]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[21]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[24]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[25]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[26]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[27]  Xu-Cheng Yin,et al.  Robust Text Detection in Natural Scene Images. , 2014, IEEE transactions on pattern analysis and machine intelligence.

[28]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jun Zhang,et al.  Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Jun Guo,et al.  Text extraction from natural scene image: A survey , 2013, Neurocomputing.

[32]  Jiri Matas,et al.  FASText: Efficient Unconstrained Scene Text Detector , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Jiri Matas,et al.  Scene Text Localization and Recognition with Oriented Stroke Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[38]  Chucai Yi,et al.  Text String Detection From Natural Scenes by Structure-Based Partition and Grouping , 2011, IEEE Transactions on Image Processing.

[39]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[42]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[44]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[45]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[46]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[47]  Roberto Manduchi,et al.  Cascaded Segmentation-Detection Networks for Word-Level Text Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[48]  Chen Chen,et al.  Gabor Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[49]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[50]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[51]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).