IoU-Related Arbitrary Shape Text Scoring Detector

As a medium of information transmission, text is widely existed in natural scenes, showing diversity in orientation, scale, font, color and shape. Accurate detection of scene text is a prerequisite for subsequent recognition. Though many previous methods have worked well on horizontal and multi-oriented text detection datasets, detecting arbitrary shape scene text still remains as a challenging problem. To solve the problem, this paper proposes an arbitrary shape scene text detection method. Based on Mask R-CNN, our method replaces the original $\ell _{1}$ -smooth loss with the proposed IoU-related loss and adds a text scoring branch to align the confidence score with the text mask IoU to make the model highly relevant to IoU, achieving the goal of improving detection performance by improving IoU directly in a simple but effective way. The proposed method is evaluated on four public datasets: CTW-1500, Total-Text, ICDAR2015 and ICDAR2017-RCTW. For curved text detection datasets CTW-1500 and Total-Text, we have reached 79.2% and 81.1% H-mean respectively, showing that the proposed method has achieved competitive performance in arbitrary scene text detection.

[1]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[2]  Lionel Prevost,et al.  A cascade detector for text detection in natural scene images , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Weisi Lin,et al.  Towards Robust Curve Text Detection With Conditional Spatial Expansion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shijian Lu,et al.  GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Chew Lim Tan,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence, Manuscript Id a Laplacian Approach to Multi-oriented Text Detection in Video , 2022 .

[6]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[7]  Shijian Lu,et al.  WeText: Scene Text Detection under Weak Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Lianwen Jin,et al.  Curved scene text detection via transverse and longitudinal sequence connection , 2019, Pattern Recognit..

[9]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Qiangpeng Yang,et al.  IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection , 2018, IJCAI.

[11]  Wei Li,et al.  R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection , 2017, ArXiv.

[12]  Shijian Lu,et al.  ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[13]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[14]  Jun Du,et al.  Sliding Line Point Regression for Shape Robust Scene Text Detection , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[15]  Chee Seng Chan,et al.  Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[16]  Errui Ding,et al.  TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network , 2018, ACCV.

[17]  Xin He,et al.  TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[18]  Xiaolin Li,et al.  Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[20]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[21]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaoyong Shen,et al.  Learning Shape-Aware Embedding for Scene Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[26]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Xiang Bai,et al.  Scene text detection and recognition: recent advances and future trends , 2015, Frontiers of Computer Science.

[29]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yuting Gao,et al.  Fused Text Segmentation Networks for Multi-oriented Scene Text Detection , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[31]  Weisi Lin,et al.  Learning Markov Clustering Networks for Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Gang Zhou,et al.  A new hybrid method to detect text in natural scene , 2011, 2011 18th IEEE International Conference on Image Processing.

[33]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[36]  Shijian Lu,et al.  Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping , 2018, ECCV.

[37]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[38]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Xiang Bai,et al.  SegLink++: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping , 2019, Pattern Recognit..

[40]  Xiang Li,et al.  Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Errui Ding,et al.  Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Arnold W. M. Smeulders,et al.  Words Matter: Scene Text for Image Classification and Retrieval , 2017, IEEE Transactions on Multimedia.

[45]  Cheng-Lin Liu,et al.  Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[47]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.