From A to Z: Supervised Transfer of Style and Content Using Deep Neural Network Generators

We propose a new neural network architecture for solving single-image analogies - the generation of an entire set of stylistically similar images from just a single input image. Solving this problem requires separating image style from content. Our network is a modified variational autoencoder (VAE) that supports supervised training of single-image analogies and in-network evaluation of outputs with a structured similarity objective that captures pixel covariances. On the challenging task of generating a 62-letter font from a single example letter we produce images with 22.4% lower dissimilarity to the ground truth than state-of-the-art.

[1]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[2]  Jan Kautz,et al.  Learning a manifold of fonts , 2014, ACM Trans. Graph..

[3]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[4]  Yoshua Bengio,et al.  Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[5]  Ali Farhadi,et al.  Visalogy: Answering Visual Analogy Questions , 2015, NIPS.

[6]  Juha Karhunen,et al.  Gated Boltzmann Machine in Texture Modeling , 2012, ICANN.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[9]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[10]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[13]  Zhou Wang,et al.  On the Mathematical Properties of the Structural Similarity Index , 2012, IEEE Transactions on Image Processing.

[14]  Robert W. Heath,et al.  Design of Linear Equalizers Optimized for the Structural Similarity Index , 2008, IEEE Transactions on Image Processing.

[15]  Other Contributors Are Indicated Where They Contribute The FreeType Project , 2017 .

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[20]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[21]  Robert W. Heath,et al.  A Linear Estimator Optimized for the Structural Similarity Index and its Application to Image Denoising , 2006, 2006 International Conference on Image Processing.

[22]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[23]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[24]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[27]  Aaron Hertzmann,et al.  Exploratory font selection using crowdsourced attributes , 2014, ACM Trans. Graph..

[28]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[29]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[31]  Geoffrey E. Hinton,et al.  Generating more realistic images using gated MRF's , 2010, NIPS.

[32]  Zhou Wang,et al.  Stimulus synthesis for efficient evaluation and refinement of perceptual image quality metrics , 2004, IS&T/SPIE Electronic Imaging.

[33]  Markus H. Gross,et al.  Perceptually based downscaling of images , 2015, ACM Trans. Graph..