Scale robust deep oriented-text detection network

Abstract Text detection is a prerequisite of text recognition, and multi-oriented text detection is a hot topic recently. The existing multi-oriented text detection methods fall short when facing two issues: 1) text scales change in a wide range, and 2) there exists the foreground-background class imbalance. In this paper, we propose a scale-robust deep multi-oriented text-detection model, which not only has the efficiency of the one-stage deep detection model, but also has the comparable accuracy of the two-stage deep text-detection model. We design the feature refining block to fuse multi-scale context features for the purpose of keeping text detection in a higher-resolution feature map. Moreover, in order to mitigate the foreground-background class imbalance, Focal Loss is adopted to up weight the hard-classified samples. Our method is implemented on four benchmark text datasets: ICDAR2013, ICDAR2015, COCO-Text and MSRA-TD500. The experimental results demonstrate that our method is superior to the existing one-stage deep text-detection models and comparable to the state-of-the-art text detection methods.

[1]  Yuting Gao,et al.  Fused Text Segmentation Networks for Multi-oriented Scene Text Detection , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[2]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Qiangpeng Yang,et al.  IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection , 2018, IJCAI.

[4]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[5]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[6]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Seiichi Uchida,et al.  Could scene context be beneficial for scene text detection? , 2016, Pattern Recognit..

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[12]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Qiang Huo,et al.  Improved localization accuracy by LocNet for Faster R-CNN based text detection in natural scene images , 2019, Pattern Recognit..

[14]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lei Sun,et al.  An anchor-free region proposal network for Faster R-CNN-based text detection approaches , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[16]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[18]  Jun Du,et al.  Sliding Line Point Regression for Shape Robust Scene Text Detection , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xin He,et al.  TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[21]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Sébastien Ourselin,et al.  Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , 2017, DLMIA/ML-CDS@MICCAI.

[23]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Xiang Bai,et al.  SegLink++: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping , 2019, Pattern Recognit..

[26]  Xiang Li,et al.  Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wenyu Liu,et al.  A Unified Framework for Multioriented Text Detection and Recognition , 2014, IEEE Transactions on Image Processing.

[29]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[32]  Lianwen Jin,et al.  Curved scene text detection via transverse and longitudinal sequence connection , 2019, Pattern Recognit..

[33]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[35]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[36]  Guosheng Lin,et al.  RefineNet: Multi-Path Refinement Networks for Dense Prediction , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.