A Single-Shot Oriented Scene Text Detector with Learnable Anchors

Current regression based text detectors mainly use fixed anchors, where scales and positions can not be changed during network training. As scene texts tend to have large variation in orientations, aspect ratios and sizes, fixed anchors are insufficient to cover all varieties. This paper proposes a novel text detector with learnable anchors, named LATD. LATD contains two prediction branches. One aims to refine scales and locations of anchors according to the characteristics of scene texts. The other one receives refined anchors as defaults and regresses their offsets to text regions. These two branches are optimized jointly without sacrifices much speed. Meanwhile, we explore the class-imbalance issue between texts and backgrounds, and replace softmax loss with focal loss. Extensive experiments on both oriented and horizontal benchmarks demonstrate the effectiveness of LATD with new state-of-the-art performance. By visualizing qualitative results, as expected, LATD provides more accurate locations and lower rate of missed detections.

[1]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[3]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[4]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[5]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Wei Li,et al.  R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection , 2017, ArXiv.

[11]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[12]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[13]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[15]  Shijian Lu,et al.  Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping , 2018, ECCV.

[16]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[18]  Xin He,et al.  TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[19]  Xiaolin Li,et al.  Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[22]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.