CISI-net: Explicit Latent Content Inference and Imitated Style Rendering for Image Inpainting

Convolutional neural networks (CNNs) have presented their potential in filling large missing areas with plausible contents. To address the blurriness issue commonly existing in the CNN-based inpainting, a typical approach is to conduct texture refinement on the initially completed images by replacing the neural patch in the predicted region using the closest one in the known region. However, such a processing might introduce undesired content change in the predicted region, especially when the desired content does not exist in the known region. To avoid generating such incorrect content, in this paper, we propose a content inference and style imitation network (CISI-net), which explicitly separate the image data into content code and style code. The content inference is realized by performing inference in the latent space to infer the content code of the corrupted images similar to the one from the original images. It can produce more detailed content than a similar inference procedure in the pixel domain, due to the dimensional distribution of content being lower than that of the entire image. On the other hand, the style code is used to represent the rendering of content, which will be consistent over the entire image. The style code is then integrated with the inferred content code to generate the complete image. Experiments on multiple datasets including structural and natural images demonstrate that our proposed approach out-performs the existing ones in terms of content accuracy as well as texture details.

[1]  Eric P. Xing,et al.  ZM-Net: Real-time Zero-shot Image Manipulation Network , 2017, ArXiv.

[2]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[5]  Ruimin Hu,et al.  Edge-Aware Context Encoder for Image Inpainting , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Michael Elad,et al.  Style Transfer Via Texture Synthesis , 2016, IEEE Transactions on Image Processing.

[7]  Meng Wang,et al.  Semantic Image Inpainting with Progressive Generative Networks , 2018, ACM Multimedia.

[8]  Joshua B. Tenenbaum,et al.  Separating Style and Content , 1996, NIPS.

[9]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[11]  Leida Li,et al.  No Reference Quality Assessment for Screen Content Images With Both Local and Global Feature Representation , 2018, IEEE Transactions on Image Processing.

[12]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[13]  Hang Zhang,et al.  Multi-style Generative Network for Real-time Transfer , 2017, ECCV Workshops.

[14]  Yu Chen,et al.  A sensitive object‐oriented approach to big surveillance data compression for social security applications in smart cities , 2017, Softw. Pract. Exp..

[15]  Zixiang Xiong,et al.  Knowledge-Based Coding of Objects for Multisource Surveillance Video Data , 2016, IEEE Transactions on Multimedia.

[16]  Zhenan Sun,et al.  DeMeshNet: Blind Face Inpainting for Deep MeshFace Verification , 2016, IEEE Transactions on Information Forensics and Security.

[17]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Andrea Vedaldi,et al.  Texture Networks: Feed-forward Synthesis of Textures and Stylized Images , 2016, ICML.

[20]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[23]  Gözde B. Ünal,et al.  Deep Stacked Networks with Residual Polishing for Image Inpainting , 2017, ArXiv.

[24]  Shuai Yang,et al.  Context-Aware Text-Based Binary Image Stylization and Synthesis , 2018, IEEE Transactions on Image Processing.

[25]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Shiguang Shan,et al.  Shift-Net: Image Inpainting via Deep Feature Rearrangement , 2018, ECCV.

[27]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[29]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[30]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[32]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[33]  Chao Yang,et al.  Contextual-Based Image Inpainting: Infer, Match, and Translate , 2017, ECCV.

[34]  Chuan Li,et al.  Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Seunghoon Hong,et al.  Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.

[37]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.