DSRN: A Deep Scale Relationship Network for Scene Text Detection

Nowadays, scene text detection has become increasingly important and popular. However, the large variance of text scale remains the main challenge and limits the detection performance in most previous methods. To address this problem, we propose an end-to-end architecture called Deep Scale Relationship Network (DSRN) to map multiscale convolution features onto a scale invariant space to obtain uniform activation of multi-size text instances. Firstly, we develop a Scale-transfer module to transfer the multi-scale feature maps to a unified dimension. Due to the heterogeneity of features, simply concatenating feature maps with multi-scale information would limit the detection performance. Thus we propose a Scale Relationship module to aggregate the multi-scale information through bi-directional convolution operations. Finally, to further reduce the miss-detected instances, a novel Recall Loss is proposed to force the network to concern more about miss-detected text instances by up-weighting poor-classified examples. Compared with previous approaches, DSRN efficiently handles the large-variance scale problem without complex hand-crafted hyperparameter settings (e.g. scale of default boxes) and complicated post processing. On standard datasets including ICDAR2015 and MSRA-TD500, the proposed algorithm achieves the state-of-art performance with impressive speed (8.8 FPS on ICDAR2015 and 13.3 FPS on MSRA-TD500).

[1]  E.E. Pissaloux,et al.  Image Processing , 1994, Proceedings. Second Euromicro Workshop on Parallel and Distributed Processing.

[2]  Yonghyun Kim,et al.  SAN: Learning Relationship between Convolutional Features for Multi-Scale Object Detection , 2018, ECCV.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Wei Li,et al.  R2 CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[5]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Xuejin Chen,et al.  A Robust Deep Attention Network to Noisy Labels in Semi-supervised Biomedical Segmentation , 2018, ArXiv.

[10]  LiYan,et al.  Convolutional Attention Networks for Scene Text Recognition , 2019 .

[11]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Dacheng Tao,et al.  Geometry-Aware Scene Text Detection with Instance Transformation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Xin He,et al.  TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[18]  Xiaolin Li,et al.  Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[20]  Weisi Lin,et al.  Learning Markov Clustering Networks for Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Xiaogang Wang,et al.  Gated Bi-directional CNN for Object Detection , 2016, ECCV.

[23]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Wenyu Liu,et al.  A Unified Framework for Multioriented Text Detection and Recognition , 2014, IEEE Transactions on Image Processing.