EraseNet: End-to-End Text Removal in the Wild

Scene text removal has attracted increasing research interests owing to its valuable applications in privacy protection, camera-based virtual reality translation, and image editing. However, existing approaches, which fall short on real applications, are mainly because they were evaluated on synthetic or unrepresentative datasets. To fill this gap and facilitate this research direction, this article proposes a real-world dataset called SCUT-EnsText that consists of 3,562 diverse images selected from public scene text reading benchmarks, and each image is scrupulously annotated to provide visually plausible erasure targets. With SCUT-EnsText, we design a novel GAN-based model termed EraseNet that can automatically remove text located on the natural images. The model is a two-stage network that consists of a coarse-erasure sub-network and a refinement sub-network. The refinement sub-network targets improvement in the feature representation and refinement of the coarse outputs to enhance the removal performance. Additionally, EraseNet contains a segmentation head for text perception and a local-global SN-Patch-GAN with spectral normalization (SN) on both the generator and discriminator for maintaining the training stability and the congruity of the erased regions. A sufficient number of experiments are conducted on both the previous public dataset and the brand-new SCUT-EnsText. Our EraseNet significantly outperforms the existing state-of-the-art methods in terms of all metrics, with remarkably superior higher-quality results. The dataset and code will be made available at https://github.com/HCIILAB/SCUT-EnsText.

[1]  Keiji Yanai,et al.  Scene Text Eraser , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[3]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[4]  Roberto Manduchi,et al.  Automatic Semantic Content Removal by Learning to Neglect , 2018, BMVC.

[5]  Faisal Z. Qureshi,et al.  EdgeConnect: Structure Guided Image Inpainting using Edge Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[6]  Sridha Sridharan,et al.  MTRNet: A Generic Scene Text Eraser , 2019 .

[7]  Jeffrey J. Rodríguez,et al.  Image Inpainting Using Nonlocal Texture Matching and Nonlinear Filtering , 2019, IEEE Transactions on Image Processing.

[8]  Sen Liu,et al.  Progressive Image Inpainting with Full-Resolution Residual Network , 2019, ACM Multimedia.

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  Lianwen Jin,et al.  Curved scene text detection via transverse and longitudinal sequence connection , 2019, Pattern Recognit..

[11]  Lianwen Jin,et al.  Tightness-Aware Evaluation Protocol for Scene Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[13]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[14]  Patrick Pérez,et al.  Region filling and object removal by exemplar-based image inpainting , 2004, IEEE Transactions on Image Processing.

[15]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[16]  Shijian Lu,et al.  ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[18]  Bo Du,et al.  Progressive Reconstruction of Visual Structure for Image Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[20]  Zhe Zhu,et al.  A Large Chinese Text Dataset in the Wild , 2019, Journal of Computer Science and Technology.

[21]  Wangmeng Zuo,et al.  Image Inpainting With Learnable Bidirectional Attention Maps , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[24]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[25]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[26]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[31]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[32]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Sekhar Mandal,et al.  A Group-Based Image Inpainting Using Patch Refinement in MRF Framework , 2018, IEEE Transactions on Image Processing.

[35]  Wafa Khlif,et al.  ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019 , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[36]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[37]  Errui Ding,et al.  Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Sridha Sridharan,et al.  Component-Based Attention for Large-Scale Trademark Retrieval , 2018, IEEE Transactions on Information Forensics and Security.

[39]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[41]  Liang Wu,et al.  Editing Text in the Wild , 2019, ACM Multimedia.

[42]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[43]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[44]  Xiang Li,et al.  Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[46]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Chee Seng Chan,et al.  Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[48]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Lianwen Jin,et al.  EnsNet: Ensconce Text in the Wild , 2018, AAAI.

[50]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[52]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[53]  Dongyoon Han,et al.  Character Region Awareness for Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Lianwen Jin,et al.  ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[55]  Zongben Xu,et al.  Image Inpainting by Patch Propagation Using Patch Sparsity , 2010, IEEE Transactions on Image Processing.

[56]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).