MTRNet++: One-stage Mask-based Scene Text Eraser

Abstract A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine, coarse-inpainting and fine-inpainting branches, and attention blocks. With this architecture, MTRNet++ can remove text either with or without an external mask. It achieves state-of-the-art results on both the Oxford and SCUT datasets without using external ground-truth masks. The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential. It also demonstrates controllability and interpretability.

[1]  Rob Fergus,et al.  Restoring an Image Taken through a Window Covered with Dirt or Rain , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  D. R. Patil,et al.  Text detection and removal from image using inpainting with smoothing , 2015, 2015 International Conference on Pervasive Computing (ICPC).

[3]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Youngjoo Jo,et al.  SC-FEGAN: Face Editing Generative Adversarial Network With User’s Sketch and Color , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Wenhan Yang,et al.  Attentive Generative Adversarial Network for Raindrop Removal from A Single Image , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[7]  Deniz Erdogmus,et al.  Tversky Loss Function for Image Segmentation Using 3D Fully Convolutional Deep Networks , 2017, MLMI@MICCAI.

[8]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[9]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[10]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[11]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[12]  Sridha Sridharan,et al.  MTRNet: A Generic Scene Text Eraser , 2019 .

[13]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  A. Behrad,et al.  Text localization, extraction and inpainting in color images , 2012, 20th Iranian Conference on Electrical Engineering (ICEE2012).

[16]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[17]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Mehran Ebrahimi,et al.  EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning , 2019, ArXiv.

[19]  Lianwen Jin,et al.  EnsNet: Ensconce Text in the Wild , 2018, AAAI.

[20]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[21]  Keiji Yanai,et al.  Scene Text Eraser , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[22]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[23]  Lei Wang,et al.  Coarse-to-Fine Image Inpainting via Region-wise Convolutions and Non-Local Correlation , 2019, IJCAI.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Dongyoon Han,et al.  Character Region Awareness for Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Hideki Nakayama,et al.  Erasing Scene Text with Weak Supervision , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yu-Bin Yang,et al.  Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections , 2016, ArXiv.

[30]  Gonen Eren,et al.  Evaluation of video activity localizations integrating quality and quantity measurements , 2014, Comput. Vis. Image Underst..

[31]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Sridha Sridharan,et al.  Component-Based Attention for Large-Scale Trademark Retrieval , 2018, IEEE Transactions on Information Forensics and Security.