Perceptual Losses for Real-Time Style Transfer and Super-Resolution

We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

[1]  Michal Irani,et al.  Improving resolution by image registration , 1991, CVGIP Graph. Model. Image Process..

[2]  William T. Freeman,et al.  Example-Based Super-Resolution , 2002, IEEE Computer Graphics and Applications.

[3]  Nanning Zheng,et al.  Image hallucination with primal sketch priors , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  D. Yeung,et al.  Super-resolution through neighbor embedding , 2004, CVPR 2004.

[5]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[6]  Eric Dubois,et al.  Image up-sampling using total-variation regularization with a new observation model , 2005, IEEE Transactions on Image Processing.

[7]  Alan C. Bovik,et al.  A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms , 2006, IEEE Transactions on Image Processing.

[8]  Alan C. Bovik,et al.  Image information and visual quality , 2006, IEEE Trans. Image Process..

[9]  Truong Q. Nguyen,et al.  Image Superresolution Using Support Vector Regression , 2007, IEEE Transactions on Image Processing.

[10]  H. Shum,et al.  Image super-resolution using gradient profile prior , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Chi-Keung Tang,et al.  Fast image/video upsampling , 2008, SIGGRAPH 2008.

[12]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Mohammed Ghanbari,et al.  Scope of validity of PSNR in image/video quality assessment , 2008 .

[14]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[15]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Zhiwei Xiong,et al.  Robust Web Image/Video Super-Resolution , 2010, IEEE Transactions on Image Processing.

[17]  Thomas S. Huang,et al.  Non-Local Kernel Regression for Image and Video Restoration , 2010, ECCV.

[18]  Kwang In Kim,et al.  Single-Image Super-Resolution Using Sparse Regression and Natural Image Prior , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Michael Elad,et al.  On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[20]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[21]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[22]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[23]  Raanan Fattal,et al.  Image and video upscaling from local self-examples , 2011, TOGS.

[24]  Pierre Vandergheynst,et al.  Beyond bits: Reconstructing images from Local Binary Descriptors , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[25]  Aline Roumy,et al.  Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding , 2012, BMVC.

[26]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Zhe L. Lin,et al.  Fast Image Super-Resolution Based on In-Place Example Regression , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Touradj Ebrahimi,et al.  Benchmarking of quality metrics on ultra-high definition video sequences , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).

[29]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Parsing , 2013, ArXiv.

[30]  Antonio Torralba,et al.  HOGgles: Visualizing Object Detection Features , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Russell Zaretzki,et al.  Beta Process Joint Dictionary Learning for Coupled Feature Spaces with Application to Single Image Super-Resolution , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[33]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[34]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[35]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[36]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[37]  Chih-Yuan Yang,et al.  Single-Image Super-Resolution: A Benchmark , 2014, ECCV.

[38]  Luc Van Gool,et al.  A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution , 2014, ACCV.

[39]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[40]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[41]  Alexandre Alahi,et al.  From Bits to Images: Inversion of Local Binary Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Makoto Takizawa,et al.  Future Data and Security Engineering , 2014, Lecture Notes in Computer Science.

[43]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[44]  Brian L. Evans,et al.  Full-reference visual quality assessment for synthetic images: A subjective study , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[45]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[46]  Narendra Ahuja,et al.  Single image super-resolution from transformed self-exemplars , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[50]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Abhinav Gupta,et al.  Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Bin Sheng,et al.  Deep Colorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Horst Bischof,et al.  Fast and accurate image upscaling with super-resolution forests , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[57]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[58]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[59]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[60]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[61]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[62]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[63]  Thomas Brox,et al.  Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[68]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[69]  Andrea Vedaldi,et al.  Texture Networks: Feed-forward Synthesis of Textures and Stylized Images , 2016, ICML.