CM-GAN: Image Inpainting with Cascaded Modulation GAN and Object-Aware Training

Recent image inpainting methods have made great progress but often struggle to generate plausible image structures when dealing with large holes in complex images. This is partially due to the lack of effective network structures that can capture both the long-range dependency and high-level semantics of an image. We propose cascaded modulation GAN (CM-GAN), a new network design consisting of an encoder with Fourier convolution blocks that extract multi-scale feature representations from the input image with holes and a dual-stream decoder with a novel cascaded global-spatial modulation block at each scale level. In each decoder block, global modulation is first applied to perform coarse and semantic-aware structure synthesis, followed by spatial modulation to further adjust the feature map in a spatially adaptive fashion. In addition, we design an object-aware training scheme to prevent the network from hallucinating new objects inside holes, fulfilling the needs of object removal tasks in real-world scenarios. Extensive experiments are conducted to show that our method significantly outperforms existing methods in both quantitative and qualitative evaluation. Please refer to the project page: \url{https://github.com/htzheng/CM-GAN-Inpainting}.

[1]  Victor Lempitsky,et al.  Resolution-robust Large Mask Inpainting with Fourier Convolutions , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[2]  Vishal M. Patel,et al.  CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Eli Shechtman,et al.  Pose with style , 2021, ACM Trans. Graph..

[4]  Sungjoo Yoo,et al.  Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jing Liao,et al.  High-Fidelity Pluralistic Image Completion with Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Shengyu Zhao,et al.  Large Scale Image Completion via Co-Modulated Generative Adversarial Networks , 2021, ICLR.

[7]  Dong Liu,et al.  Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xiaojuan Qi,et al.  Fully Convolutional Networks for Panoptic Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jianfei Cai,et al.  TFill: Image Completion via a Transformer-Based Architecture , 2021, ArXiv.

[10]  Lu Yuan,et al.  Semantic Image Synthesis via Efficient Class-Adaptive Normalization , 2020, ArXiv.

[11]  Wei Huang,et al.  Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations , 2020, ECCV.

[12]  Jan-Michael Frahm,et al.  One shot 3D photography , 2020, ACM Trans. Graph..

[13]  Alexei A. Efros,et al.  Swapping Autoencoder for Deep Image Manipulation , 2020, NeurIPS.

[14]  Zhe L. Lin,et al.  High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling , 2020, ECCV.

[15]  Zhan Xu,et al.  Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Luc Van Gool,et al.  SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects , 2020, ECCV.

[17]  Yong Shi,et al.  Learning to Incorporate Structure Knowledge for Image Inpainting , 2020, AAAI.

[18]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yadong Mu,et al.  Fast Fourier Convolution , 2020, NeurIPS.

[20]  Jiebo Luo,et al.  Example-Guided Image Synthesis Using Masked Spatial-Channel Attention and Self-supervision , 2020, ECCV.

[21]  Feng Liu,et al.  3D Ken Burns effect from a single image , 2019, ACM Trans. Graph..

[22]  Thomas H. Li,et al.  StructureFlow: Image Inpainting via Structure-Aware Appearance Flow , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Bor-Chun Chen,et al.  Toward Realistic Image Compositing With Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Wei Xiong,et al.  Foreground-Aware Image Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Mehran Ebrahimi,et al.  EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning , 2019, ArXiv.

[27]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Qin Huang,et al.  SPG-Net: Segmentation Prediction and Guidance Network for Image Inpainting , 2018, BMVC.

[31]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[32]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[33]  Chao Dong,et al.  Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[35]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[37]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[39]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[40]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[41]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[44]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[46]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[47]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[50]  P. D. Gawande,et al.  REGION FILLING AND OBJECT REMOVAL BY EXEMPLAR-BASED IMAGE INPAINTING , 2012 .

[51]  Natasha Gelfand,et al.  A survey of image retargeting techniques , 2010, Optical Engineering + Applications.

[52]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[53]  William T. Freeman,et al.  The patch transform and its applications to image editing , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Tony F. Chan,et al.  Structure-Texture Image Decomposition—Modeling, Algorithms, and Parameter Selection , 2006, International Journal of Computer Vision.

[55]  Ramesh Raskar,et al.  Automatic image retargeting , 2005, MUM '05.

[56]  Nipun Kwatra,et al.  Texture optimization for example-based synthesis , 2005, ACM Trans. Graph..

[57]  Patrick Pérez,et al.  Region filling and object removal by exemplar-based image inpainting , 2004, IEEE Transactions on Image Processing.

[58]  Guillermo Sapiro,et al.  Simultaneous structure and texture image inpainting , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[59]  Tony F. Chan,et al.  Mathematical Models for Local Nontexture Inpaintings , 2002, SIAM J. Appl. Math..

[60]  Tony F. Chan,et al.  Nontexture Inpainting by Curvature-Driven Diffusions , 2001, J. Vis. Commun. Image Represent..

[61]  Guillermo Sapiro,et al.  Filling-in by joint interpolation of vector fields and gray levels , 2001, IEEE Trans. Image Process..

[62]  F. Durand,et al.  Image-based modeling and photo editing , 2001, SIGGRAPH.

[63]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.