Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks

It is a long standing question how biological systems transform visual inputs to robustly infer high-level visual information. Research in the last decades has established that much of the underlying computations take place in a hierarchical fashion along the ventral visual pathway. However, the exact processing stages along this hierarchy are difficult to characterise. Here we present a method to generate stimuli that will allow a principled description of the processing stages along the ventral stream. We introduce a new parametric texture model based on the powerful feature spaces of convolutional neural networks optimised for object recognition. We show that constraining a spatial summary statistic over feature maps suffices to synthesise high-quality natural textures. Moreover we establish that our texture representations continuously disentangle high-level visual information and demonstrate that the hierarchical parameterisation of the texture model naturally enables us to generate novel types of stimuli for systematically probing mid-level vision.

[1]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[2]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[3]  Irfan A. Essa,et al.  Graphcut textures: image and video synthesis using graph cuts , 2003, ACM Trans. Graph..

[4]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[5]  R. Rosenholtz,et al.  A summary-statistic representation in peripheral vision explains visual crowding. , 2009, Journal of vision.

[6]  Simoncelli Eero Metamers of the ventral stream , 2010 .

[7]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[8]  R. Rosenholtz,et al.  A summary statistic representation in peripheral vision explains visual search. , 2009, Journal of vision.

[9]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[12]  Eero P. Simoncelli,et al.  A functional and perceptual signature of the second visual area in primates , 2013, Nature Neuroscience.

[13]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[14]  H. Komatsu,et al.  Image statistics underlying natural texture selectivity of neurons in macaque V4 , 2014, Proceedings of the National Academy of Sciences.

[15]  Subhransu Maji,et al.  Deep convolutional filter banks for texture recognition and segmentation , 2014, ArXiv.

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  Eero P. Simoncelli,et al.  Representation of Naturalistic Image Structure in the Primate Visual Cortex. , 2014, Cold Spring Harbor symposia on quantitative biology.

[18]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[19]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.