论文信息 - PERT: A Progressively Region-based Network for Scene Text Removal

PERT: A Progressively Region-based Network for Scene Text Removal

Scene text removal (STR) contains two processes: text localization and background reconstruction. Through integrating both processes into a single network, previous methods provide an implicit erasure guidance by modifying all pixels in the entire image. However, there exists two problems: 1) the implicit erasure guidance causes the excessive erasure to non-text areas; 2) the one-stage erasure lacks the exhaustive removal of text region. In this paper, we propose a ProgrEssively Region-based scene Text eraser (PERT), introducing an explicit erasure guidance and performing balanced multi-stage erasure for accurate and exhaustive text removal. Firstly, we introduce a new region-based modification strategy (RegionMS) to explicitly guide the erasure process. Different from previous implicitly guided methods, RegionMS performs targeted and regional erasure on only text region, and adaptively perceives stroke-level information to improve the integrity of non-text areas with only bounding box level annotations. Secondly, PERT performs balanced multi-stage erasure with several progressive erasing stages. Each erasing stage takes an equal step toward the text-erased image to ensure the exhaustive erasure of text regions. Compared with previous methods, PERT outperforms them by a large margin without the need of adversarial loss, obtaining SOTA results with high speed (71 FPS) and at least 25% lower parameter complexity. Code is available at https://github.com/wangyuxin87/ PERT.

Yongdong Zhang | Hongtao Xie | Shancheng Fang | Yuxin Wang | Yadong Qu

[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Xiang Li,et al. Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Lianwen Jin,et al. EraseNet: End-to-End Text Removal in the Wild , 2020, IEEE Transactions on Image Processing.

[6] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Ernest Valveny,et al. ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[8] Wei Zhang,et al. MSR: Multi-Scale Shape Regression for Scene Text Detection , 2019, IJCAI.

[9] Yongdong Zhang,et al. DSRN: A Deep Scale Relationship Network for Scene Text Detection , 2019, IJCAI.

[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[11] Xiaoyong Shen,et al. Learning Shape-Aware Embedding for Scene Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] Errui Ding,et al. Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[15] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[16] Keiji Yanai,et al. Scene Text Eraser , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17] Sridha Sridharan,et al. MTRNet++: One-stage Mask-based Scene Text Eraser , 2019, Comput. Vis. Image Underst..

[18] Harshad Rai,et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[19] Lianwen Jin,et al. EnsNet: Ensconce Text in the Wild , 2018, AAAI.

[20] Yuxin Wang,et al. ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Shuigeng Zhou,et al. Focusing Attention: Towards Accurate Text Recognition in Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[23] Shijian Lu,et al. ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Sridha Sridharan,et al. MTRNet: A Generic Scene Text Eraser , 2019 .

[25] Zheng-Jun Zha,et al. R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection , 2020, IEEE Transactions on Multimedia.

[26] Peng Wang,et al. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition , 2018, AAAI.

[27] Tong Lu,et al. Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[29] Lianwen Jin,et al. Tightness-Aware Evaluation Protocol for Scene Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] D. R. Patil,et al. Text detection and removal from image using inpainting with smoothing , 2015, 2015 International Conference on Pervasive Computing (ICPC).

[31] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] A. Behrad,et al. Text localization, extraction and inpainting in color images , 2012, 20th Iranian Conference on Electrical Engineering (ICEE2012).

[33] Weiping Wang,et al. SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Xiang Bai,et al. TextScanner: Reading Characters in Order for Robust Scene Text Recognition , 2019, AAAI.