A parametric texture model based on deep convolutional features closely matches texture appearance for humans

Our visual environment is full of texture—“stuff” like cloth, bark or gravel as distinct from “things” like dresses, trees or paths—and humans are adept at perceiving subtle variations in material properties. To investigate image features important for texture perception, we psychophysically compare a recent parameteric model of texture appearance (CNN model) that uses the features encoded by a deep convolutional neural network (VGG-19) with two other models: the venerable Portilla and Simoncelli model (PS) and an extension of the CNN model in which the power spectrum is additionally matched. Observers discriminated model-generated textures from original natural textures in a spatial three-alternative oddity paradigm under two viewing conditions: when test patches were briefly presented to the near-periphery (“parafoveal”) and when observers were able to make eye movements to all three patches (“inspection”). Under parafoveal viewing, observers were unable to discriminate 10 of 12 original images from CNN model images, and remarkably, the simpler PS model performed slightly better than the CNN model (11 textures). Under foveal inspection, matching CNN features captured appearance substantially better than the PS model (9 compared to 4 textures), and including the power spectrum improved appearance matching for two of the three remaining textures. None of the models we test here could produce indiscriminable images for one of the 12 textures under the inspection condition. While deep CNN (VGG-19) features can often be used to synthesise textures that humans cannot discriminate from natural textures, there is currently no uniformly best model for all textures and viewing conditions.

[1]  M. A. Goodale,et al.  What is the best fixation target? The effect of target shape on stability of fixational eye movements , 2013, Vision Research.

[2]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[3]  Michael C. Frank Language as a link between exact number and approximate magnitude , 2010 .

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Matthias Bethge,et al.  Comparing deep neural networks against humans: object recognition when the signal gets weaker , 2017, ArXiv.

[6]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[7]  Matthias Bethge,et al.  Testing models of peripheral encoding using metamerism in an oddity paradigm. , 2016, Journal of vision.

[8]  R. Rosenholtz,et al.  A summary-statistic representation in peripheral vision explains visual crowding. , 2009, Journal of vision.

[9]  E. Adelson,et al.  The Plenoptic Function and the Elements of Early Vision , 1991 .

[10]  J. Gibson The perception of visual surfaces. , 1950, The American journal of psychology.

[11]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[12]  Jeffrey N. Rouder,et al.  The fallacy of placing confidence in confidence intervals , 2015, Psychonomic bulletin & review.

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Dimitrios Pantazis,et al.  Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks , 2015, NeuroImage.

[15]  Susana T. L. Chung,et al.  Can (should) theories of crowding be unified? , 2016, Journal of vision.

[16]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Benjamin J. Balas,et al.  Texture synthesis and perception: Using computational models to study texture representations in the human visual system , 2006, Vision Research.

[20]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[21]  Leon A. Gatys,et al.  Texture Synthesis Using Shallow Convolutional Networks with Random Filters , 2016, ArXiv.

[22]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[23]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[24]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[25]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[26]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[27]  Dorota Kurowicka,et al.  Generating random correlation matrices based on vines and extended onion method , 2009, J. Multivar. Anal..

[28]  Ruth Rosenholtz,et al.  What your visual system sees where you are not looking , 2011, Electronic Imaging.

[29]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[30]  Jonas Kubilius,et al.  Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[31]  Eero P. Simoncelli,et al.  Selectivity and tolerance for visual texture in macaque V2 , 2016, Proceedings of the National Academy of Sciences.

[32]  Thrasyvoulos N. Pappas,et al.  The rough side of texture: texture analysis through the lens of HVEI , 2013, Electronic Imaging.

[33]  Daniel J. Thengone,et al.  Perception of second- and third-order orientation signals and their interactions. , 2013, Journal of vision.

[34]  Jeffrey N. Rouder,et al.  Continued misinterpretation of confidence intervals: response to Miller and Ulrich , 2015, Psychonomic Bulletin & Review.

[35]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[36]  B. Julesz,et al.  Visual discrimination of textures with identical third-order statistics , 1978, Biological Cybernetics.

[37]  Benjamin Balas,et al.  Invariant texture perception is harder with synthetic textures: Implications for models of texture processing , 2015, Vision Research.

[38]  R. Rosenholtz,et al.  A summary statistic representation in peripheral vision explains visual search. , 2009, Journal of vision.

[39]  B. Balas Contrast Negation and Texture Synthesis Differentially Disrupt Natural Texture Appearance , 2012, Front. Psychology.

[40]  M. Porat,et al.  Localized texture processing in vision: analysis and synthesis in the Gaborian space , 1989, IEEE Transactions on Biomedical Engineering.

[41]  R. Fleming Visual perception of materials and their properties , 2014, Vision Research.

[42]  Denis G. Pelli,et al.  ECVP '07 Abstracts , 2007, Perception.

[43]  Béla Julesz,et al.  Visual Pattern Discrimination , 1962, IRE Trans. Inf. Theory.

[44]  L. Kehrer Central performance drop on perceptual segregation tasks. , 1989, Spatial vision.

[45]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[46]  James R. Bergen,et al.  Pyramid-based texture analysis/synthesis , 1995, Proceedings., International Conference on Image Processing.

[47]  Richard McElreath,et al.  Statistical Rethinking: A Bayesian Course with Examples in R and Stan , 2015 .

[48]  Eero P. Simoncelli,et al.  Texture characterization via joint statistics of wavelet coefficient magnitudes , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[49]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[50]  Robert J. Safranek,et al.  Perceptually tuned sub-band image coder , 1990, Other Conferences.

[51]  Torrin M. Liddell,et al.  The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective , 2016, Psychonomic bulletin & review.

[52]  Michael S. Landy,et al.  Texture analysis and perception , 2013 .

[53]  M. Landy,et al.  Conjoint Measurement of Gloss and Surface Texture , 2008, Psychological science.

[54]  Jeff Miller,et al.  Interpreting confidence intervals: A comment on Hoekstra, Morey, Rouder, and Wagenmakers (2014) , 2016, Psychonomic bulletin & review.

[55]  P. Gustafson,et al.  Conservative prior distributions for variance parameters in hierarchical models , 2006 .

[56]  Gang Liu,et al.  Texture synthesis through convolutional neural networks and spectrum constraints , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[57]  J. Koenderink,et al.  Eidolons: Novel stimuli for vision research. , 2017, Journal of vision.

[58]  Alessandro Moscatelli,et al.  Modeling psychophysical data at the population-level: the generalized linear mixed model. , 2012, Journal of vision.

[59]  Song-Chun Zhu,et al.  Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling , 1998, International Journal of Computer Vision.

[60]  R. Gurnsey,et al.  Texture segmentation along the horizontal meridian: nonmonotonic changes in performance with eccentricity. , 1996, Journal of experimental psychology. Human perception and performance.

[61]  M. Landy Texture perception , 1996 .

[62]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[63]  Eero P. Simoncelli,et al.  Representation of Naturalistic Image Structure in the Primate Visual Cortex. , 2014, Cold Spring Harbor symposia on quantitative biology.

[64]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[65]  Yihui Xie,et al.  knitr: A Comprehensive Tool for Reproducible Research in R , 2018, Implementing Reproducible Research.

[66]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[67]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[68]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[69]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[70]  Krista A. Ehinger,et al.  Rethinking the Role of Top-Down Attention in Vision: Effects Attributable to a Lossy Representation in Peripheral Vision , 2011, Front. Psychology.

[71]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[72]  Eero P. Simoncelli,et al.  A functional and perceptual signature of the second visual area in primates , 2013, Nature Neuroscience.

[73]  Edward H. Adelson,et al.  On seeing stuff: the perception of materials by humans and machines , 2001, IS&T/SPIE Electronic Imaging.

[74]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[75]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[76]  Kenneth Knoblauch,et al.  Modeling Psychophysical Data in R , 2012 .

[77]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[78]  G. Legge,et al.  Nonlinear mixed-effects modeling of MNREAD data. , 2008, Investigative ophthalmology & visual science.

[79]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[80]  M. Herzog,et al.  Crowding, grouping, and object recognition: A matter of appearance. , 2015, Journal of vision.

[81]  Benjamin J. Balas,et al.  Attentive texture similarity as a categorization task: Comparing texture synthesis models , 2008, Pattern Recognit..

[82]  R. Gurnsey,et al.  Texture segmentation along the horizontal meridian: nonmonotonic changes in performance with eccentricity. , 1996 .

[83]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[84]  Michael H. Herzog,et al.  Visual crowding illustrates the inadequacy of local vs. global and feedforward vs. feedback distinctions in modeling visual perception , 2014, Front. Psychol..

[85]  Ha Hong,et al.  Explicit information for category-orthogonal object properties increases along the ventral stream , 2016, Nature Neuroscience.

[86]  Paul-Christian Bürkner,et al.  brms: An R Package for Bayesian Multilevel Models Using Stan , 2017 .

[87]  Ha Hong,et al.  Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[88]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[89]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[90]  J. Kruschke Doing Bayesian Data Analysis , 2010 .

[91]  J. Gibson,et al.  The relation of apparent shape to apparent slant in the perception of objects. , 1955, Journal of experimental psychology.

[92]  D. Cano,et al.  Texture synthesis using hierarchical linear transforms , 1988 .

[93]  R. J. Safranek,et al.  A perceptually tuned sub-band image coder with image dependent quantization and post-quantization data compression , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[94]  H. Komatsu,et al.  Image statistics underlying natural texture selectivity of neurons in macaque V4 , 2014, Proceedings of the National Academy of Sciences.

[95]  P Perona,et al.  Preattentive texture discrimination with early vision mechanisms. , 1990, Journal of the Optical Society of America. A, Optics and image science.

[96]  Jeffrey B. Arnold Extra Themes, Scales and Geoms for 'ggplot2' , 2016 .

[97]  L. Kehrer Perceptual segregation and retinal position. , 1987, Spatial vision.

[98]  R. Rosenholtz,et al.  Pooling of continuous features provides a unifying account of crowding , 2016, Journal of vision.

[99]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[100]  Karl R Gegenfurtner,et al.  Dynamics of oculomotor direction discrimination. , 2012, Journal of vision.