Towards Accurate Scene Text Detection with Bidirectional Feature Pyramid Network

Scene text detection, this task of detecting text from real images, is a hot research topic in the machine vision community. Most of the current research is based on an anchor box. These methods are complex in model design and time-consuming to train. In this paper, we propose a new Fully Convolutional One-Stage Object Detection (FCOS)-based text detection method that can robustly detect multioriented and multilingual text from natural scene images in a per pixel prediction approach. Our proposed text detector employs an anchor-free approach, unlike state-of-the-art text detectors that do not rely on a predefined anchor box. In order to enhance the feature representation ability of FCOS for text detection tasks, we apply the Bidirectional Feature Pyramid Network (BiFPN) as the backbone network, enhancing the model learning capacity and increasing the receptive field. We demonstrate the superior performance of our method on multioriented (ICDAR-2015, ICDAR-2017 MLT) and horizontal (ICDAR-2013) text detection benchmark tasks. Moreover, our method has an f-measure of 88.65 and 86.32 for the benchmark datasets ICDAR 2013 and ICDAR 2015, respectively, and 80.75 for the ICDAR-2017 MLT dataset.

[1]  Zhi Tang,et al.  A quadrilateral scene text detector with two-stage network architecture , 2020, Pattern Recognit..

[2]  Alan L. Yuille,et al.  OriNet: A Fully Convolutional Network for 3D Human Pose Estimation , 2018, BMVC.

[3]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[4]  Chunhua Shen,et al.  Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[6]  Xiang Li,et al.  Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yuan Li,et al.  Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks , 2018, ArXiv.

[9]  Chunheng Wang,et al.  TextEdge: Multi-oriented Scene Text Detection via Region Segmentation and Edge Classification , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[12]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[14]  Yi Lin,et al.  Detecting Multi-Oriented Text with Corner-based Region Proposals , 2018, Neurocomputing.

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Jiri Matas,et al.  FASText: Efficient Unconstrained Scene Text Detector , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Shijian Lu,et al.  WeText: Scene Text Detection under Weak Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[20]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[23]  Shuchang Zhou,et al.  Scene Text Detection via Holistic, Multi-Channel Prediction , 2016, ArXiv.

[24]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Lianwen Jin,et al.  DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images , 2016, ArXiv.

[26]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[27]  Zihan Zhou,et al.  Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Lianwen Jin,et al.  Detecting Curve Text in the Wild: New Dataset and New Solution , 2017, ArXiv.

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[32]  Xiu-Shen Wei,et al.  Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Lei Sun,et al.  Mask R-CNN With Pyramid Attention Network for Scene Text Detection , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[34]  Ujjwal Bhattacharya,et al.  Scale-Invariant Multi-Oriented Text Detection in Wild Scene Image , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[35]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Dena Bazazian,et al.  Fully Convolutional Networks for Text Understanding in Scene Images , 2020 .

[38]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[39]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[40]  Wei Li,et al.  R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection , 2017, ArXiv.

[41]  Tanima Dutta,et al.  Recurrent Global Convolutional Network for Scene Text Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[42]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[43]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xiaolin Li,et al.  Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[47]  Chunhua Shen,et al.  Enforcing Geometric Constraints of Virtual Normal for Depth Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Kai Chen,et al.  Real-time Scene Text Detection with Differentiable Binarization , 2019, AAAI.

[49]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Jun Du,et al.  Sliding Line Point Regression for Shape Robust Scene Text Detection , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[51]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Gang Yu,et al.  Scene Text Detection with Supervised Pyramid Context Network , 2018, AAAI.

[53]  Qiang Guo,et al.  Robust Text Detection with Vertically-Regressed Proposal Network , 2016, ECCV Workshops.

[54]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Hao Chen,et al.  ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).