Comprehensive Studies for Arbitrary-shape Scene Text Detection

Numerous scene text detection methods have been proposed in recent years. Most of them declare they have achieved state-of-the-art performances. However, the performance comparison is unfair, due to lots of inconsistent settings (e.g., training data, backbone network, multi-scale feature fusion, evaluation protocols, etc.). These various settings would dissemble the pros and cons of the proposed core techniques. In this paper, we carefully examine and analyze the inconsistent settings, and propose a unified framework for the bottom-up based scene text detection methods. Under the unified framework, we ensure the consistent settings for non-core modules, and mainly investigate the representations of describing arbitrary-shape scene texts, e.g., regressing points on text contours, clustering pixels with predicted auxiliary information, grouping connected components with learned linkages, etc. With the comprehensive investigations and elaborate analyses, it not only cleans up the obstacle of understanding the performance differences between existing methods, but also reveals the advantages and disadvantages of previous models under fair comparisons.

[1]  Dongyoon Han,et al.  Character Region Awareness for Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Pengfei Wang,et al.  A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning , 2019, ACM Multimedia.

[3]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Marcus Rohrbach,et al.  TextCaps: a Dataset for Image Captioning with Reading Comprehension , 2020, ECCV.

[5]  Hongtao Xie,et al.  CRNet: A Center-aware Representation for Detecting Text of Arbitrary Shapes , 2020, ACM Multimedia.

[6]  Xin He,et al.  TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[7]  Xiaochun Cao,et al.  Deep Strip-Based Network with Cascade Learning for Scene Text Localization , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[8]  Wei Feng,et al.  TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Gang Yao,et al.  Sequential Deformation for Accurate Scene Text Detection , 2020, ECCV.

[10]  Weisi Lin,et al.  Towards Robust Curve Text Detection With Conditional Spatial Expansion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[12]  Tong Lu,et al.  Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Dacheng Tao,et al.  Geometry-Aware Scene Text Detection with Instance Transformation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Linjie Xing,et al.  Convolutional Character Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Zhe Chen,et al.  TextFuseNet: Scene Text Detection with Richer Fused Features , 2020, IJCAI.

[17]  Lianwen Jin,et al.  Arbitrarily Shaped Scene Text Detection With a Mask Tightness Text Detector , 2019, IEEE Transactions on Image Processing.

[18]  Yuxin Wang,et al.  ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[20]  Lianwen Jin,et al.  Curved scene text detection via transverse and longitudinal sequence connection , 2019, Pattern Recognit..

[21]  Cheng-Lin Liu,et al.  Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Errui Ding,et al.  Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Chee Seng Chan,et al.  Total-Text: toward orientation robustness in scene text detection , 2019, International Journal on Document Analysis and Recognition (IJDAR).

[24]  Alessandro Bissacco,et al.  Towards Unconstrained End-to-End Text Spotting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[26]  Kai Chen,et al.  Real-time Scene Text Detection with Differentiable Binarization , 2019, AAAI.

[27]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[28]  Xiang Bai,et al.  SegLink++: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping , 2019, Pattern Recognit..

[29]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Lianwen Jin,et al.  Text Recognition in the Wild , 2020, ACM Comput. Surv..

[33]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[34]  Jiri Matas,et al.  ICDAR2017 Robust Reading Challenge on COCO-Text , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[35]  Xiang Li,et al.  Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Gang Yu,et al.  Scene Text Detection with Supervised Pyramid Context Network , 2018, AAAI.

[37]  Fei Wu,et al.  TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection , 2020, ACM Multimedia.

[38]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[39]  Xiaoyong Shen,et al.  Learning Shape-Aware Embedding for Scene Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xu-Cheng Yin,et al.  Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Fei Wu,et al.  Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting , 2020, AAAI.

[42]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Sen Wang,et al.  TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Xin He,et al.  Scene Text Detection and Recognition: The Deep Learning Era , 2018, International Journal of Computer Vision.

[45]  Wei Zhang,et al.  MSR: Multi-Scale Shape Regression for Scene Text Detection , 2019, IJCAI.

[46]  Arnold W. M. Smeulders,et al.  Words Matter: Scene Text for Image Classification and Retrieval , 2017, IEEE Transactions on Multimedia.

[47]  Hao Wang,et al.  All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting , 2019, AAAI.