Wacnet: Word Segmentation Guided Characters Aggregation Net for Scene Text Spotting With Arbitrary Shapes

In this paper, we propose an end-to-end trainable framework for scene text spotting which can handle text with arbitrary shapes. The proposed framework is called Word Segmentation Guided Characters Aggregation Net (WAC-Net), which consists of a shared convolutional backbone and two task-specific subnetworks. One subnetwork does word-level instance-aware segmentation (WSN) and the other does char-level detection and recognition (CDRN). The entire framework segments each word instance while detects and recognizes each character in one single forward pass. These two subnetworks are jointly trained by multi-task learning. At the inference stage, characters are aggregated into words guided by word instance segmentation results. Experiments are conducted on two datasets with arbitrary shapes, and the results demonstrate the effectiveness of the proposed method.

[1]  Yuting Gao,et al.  Fused Text Segmentation Networks for Multi-oriented Scene Text Detection , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[2]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[3]  Jiri Matas,et al.  Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[5]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xin He,et al.  TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[9]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[15]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[16]  Chee Seng Chan,et al.  Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Changming Sun,et al.  An End-to-End TextSpotter with Explicit Alignment and Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[20]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.