Input Stage-1 : Regression Stage-2 : Contextual Copy-Pasting qi pi [ ] ... Pixel Representation

We present a simple nearest-neighbor (NN) approach that synthesizes highfrequency photorealistic images from an “incomplete” signal such as a lowresolution image, a surface normal map, or edges. Current state-of-the-art deep generative models designed for such conditional image synthesis lack two important things: (1) they are unable to generate a large set of diverse outputs, due to the mode collapse problem. (2) they are not interpretable, making it difficult to control the synthesized output. We demonstrate that NN approaches potentially address such limitations, but suffer in accuracy on small datasets. We design a simple pipeline that combines the best of both worlds: the first stage uses a convolutional neural network (CNN) to map the input to a (overly-smoothed) image, and the second stage uses a pixel-wise nearest neighbor method to map the smoothed output to multiple high-quality, high-frequency outputs in a controllable manner. Importantly, pixel-wise matching allows our method to compose novel high-frequency content by cutting-and-pasting pixels from different training exemplars. We demonstrate our approach for various input modalities, and for various domains ranging from human faces, pets, shoes, and handbags.

[1]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[2]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Xinlei Chen,et al.  PixelNet: Representation of the pixels, by the pixels, and for the pixels , 2017, ArXiv.

[4]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[8]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[9]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[10]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[11]  Marcel van Gerven,et al.  Convolutional Sketch Inversion , 2016, ECCV Workshops.

[12]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[13]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Abhinav Gupta,et al.  Marr Revisited: 2D-3D Alignment via Surface Normal Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[17]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[18]  Gregory Shakhnarovich,et al.  Learning Representations for Automatic Colorization , 2016, ECCV.

[19]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[20]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[21]  Qi-Xing Huang,et al.  Dense Human Body Correspondences Using Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Kavita Bala,et al.  Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[23]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[24]  Saining Xie,et al.  Holistically-Nested Edge Detection , 2015, International Journal of Computer Vision.

[25]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Abhinav Gupta,et al.  Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[30]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Rob Fergus,et al.  Restoring an Image Taken through a Window Covered with Dirt or Rain , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Martial Hebert,et al.  Data-Driven 3D Primitives for Single Image Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Ce Liu,et al.  A Bayesian Approach to Alignment-Based Image Hallucination , 2012, ECCV.

[36]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Tapani Raiko,et al.  Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[38]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[39]  Wojciech Matusik,et al.  CG2Real: Improving the Realism of Computer Generated Images Using a Large Collection of Photographs , 2011, IEEE Transactions on Visualization and Computer Graphics.

[40]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[42]  Michal Irani,et al.  Similarity by Composition , 2006, NIPS.

[43]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44]  James M. Rehg,et al.  A data-driven approach to quantifying natural human motion , 2005, ACM Trans. Graph..

[45]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  William T. Freeman,et al.  Example-Based Super-Resolution , 2002, IEEE Computer Graphics and Applications.

[47]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[48]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[49]  William T. Freeman,et al.  Learning low-level vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[50]  Alexei A. Efros,et al.  Texture synthesis by non-parametric sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.