RecycleNet: An Overlapped Text Instance Recovery Approach

Text recognition is the key pillar for many real-world multimedia applications. Existing text recognition approaches focus on recognizing isolated instances, whose text fields are visually separated and have no interference with each other. Moreover, these approaches cannot handle overlapped instances that often appear in sheets like invoices, receipts and math exercises, where printed templates are generated beforehand and extra contents are added afterward on existing texts. In this paper, we aim to tackle this problem by proposing RecycleNet, which automatically extracts and reconstructs overlapped instances by fully recycling the intersecting pixels that used to be obstacles for recognition. RecycleNet parallels to existing recognition systems, and serves as a plug-and-play module to boost recognition performance with zero-effort. We also released an OverlapText-500 dataset, which helps to boost the design of better overlapped text recovery and recognition solutions.

[1]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Fei Wu,et al.  TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection , 2020, ACM Multimedia.

[3]  Weiping Wang,et al.  SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Zihan Zhou,et al.  Learning to Read Irregular Text with Attention Mechanisms , 2017, IJCAI.

[8]  Dacheng Tao,et al.  Geometry-Aware Scene Text Detection with Instance Transformation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Linjie Xing,et al.  Convolutional Character Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Hao Chen,et al.  ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jing Huang,et al.  Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting , 2020, ECCV.

[12]  Zhifei Zhang,et al.  Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Lianwen Jin,et al.  Decoupled Attention Network for Text Recognition , 2019, AAAI.

[14]  Changming Sun,et al.  An End-to-End TextSpotter with Explicit Alignment and Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Xu-Cheng Yin,et al.  Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Pan He,et al.  Reading Scene Text in Deep Convolutional Sequences , 2015, AAAI.

[18]  Errui Ding,et al.  Towards Accurate Scene Text Recognition With Semantic Reasoning Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Zhanghui Kuang,et al.  RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition , 2020, ECCV.

[20]  Ankush Gupta,et al.  Adaptive Text Recognition through Visual Matching , 2020, ECCV.

[21]  Di Huang,et al.  A Feasible Framework for Arbitrary-Shaped Scene Text Recognition , 2019, ArXiv.

[22]  Zichen Zhang,et al.  U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection , 2020, Pattern Recognit..

[23]  Xin He,et al.  TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[24]  Furu Wei,et al.  LayoutLM: Pre-training of Text and Layout for Document Image Understanding , 2019, KDD.

[25]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Dimitris Samaras,et al.  DocUNet: Document Image Unwarping via a Stacked U-Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Alexander M. Rush,et al.  Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[28]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[29]  Xiang Li,et al.  Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[31]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[32]  Hongtao Xie,et al.  CRNet: A Center-aware Representation for Detecting Text of Arbitrary Shapes , 2020, ACM Multimedia.

[33]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[35]  Chunhua Shen,et al.  Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Tao Li,et al.  Structure-Measure: A New Way to Evaluate Foreground Maps , 2017, International Journal of Computer Vision.

[37]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[40]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[41]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[42]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[43]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[44]  Tong Lu,et al.  AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting , 2020, ECCV.

[45]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[46]  Bo Ren,et al.  Accurate Structured-Text Spotting for Arithmetical Exercise Correction , 2020, AAAI.