Image content is more important than Bouma’s Law for scene metamers

We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated ‘Bouma’s Law’ of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling.

[1]  Yury Petrov,et al.  Asymmetries and idiosyncratic hot spots in crowding , 2011, Vision Research.

[2]  Steven C Dakin,et al.  The aperture problem in contoured stimuli. , 2009, Journal of vision.

[3]  S. Palmer,et al.  A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. , 2012, Psychological bulletin.

[4]  John K. Kruschke,et al.  Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan , 2014 .

[5]  Jeffrey B. Arnold Extra Themes, Scales and Geoms for 'ggplot2' , 2016 .

[6]  M. Herzog,et al.  When crowding of crowding leads to uncrowding. , 2013, Journal of vision.

[7]  R. Rosenholtz,et al.  Pooling of continuous features provides a unifying account of crowding , 2016, Journal of vision.

[8]  Michael A. Cohen,et al.  What is the Bandwidth of Perceptual Experience? , 2016, Trends in Cognitive Sciences.

[9]  J. R. Pomerantz,et al.  A century of Gestalt psychology in visual perception: II. Conceptual and theoretical foundations. , 2012, Psychological bulletin.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Susana T. L. Chung,et al.  Can (should) theories of crowding be unified? , 2016, Journal of vision.

[12]  R. Watt,et al.  The computation of orientation statistics from visual texture , 1997, Vision Research.

[13]  Christof Koch,et al.  Are we underestimating the richness of visual experience? , 2017, Neuroscience of consciousness.

[14]  Jason Bell,et al.  Local motion effects on form in radial frequency patterns. , 2010, Journal of vision.

[15]  Paul-Christian Bürkner,et al.  brms: An R Package for Bayesian Multilevel Models Using Stan , 2017 .

[16]  Sid Kouider,et al.  Nonconscious Influences from Emotional Faces: A Comparison of Visual Crowding, Masking, and Continuous Flash Suppression , 2012, Front. Psychology.

[17]  Thomas S A Wallis,et al.  Image correlates of crowding in natural scenes. , 2011, Journal of vision.

[18]  Dirk B Walther,et al.  Nonaccidental Properties Underlie Human Categorization of Complex Natural Scenes , 2014, Psychological science.

[19]  O. Reiser,et al.  Principles Of Gestalt Psychology , 1936 .

[20]  Peter Neri,et al.  Object segmentation controls image reconstruction from natural scenes , 2017, PLoS biology.

[21]  Clara Casco,et al.  The role of crowding in contextual influences on contour integration. , 2012, Journal of vision.

[22]  Julie Delon,et al.  Accurate Junction Detection and Characterization in Natural Images , 2013, International Journal of Computer Vision.

[23]  Jonas Kubilius,et al.  Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[24]  Frank Tong,et al.  Foundations of Vision , 2018 .

[25]  Eero P. Simoncelli,et al.  Selectivity and tolerance for visual texture in macaque V2 , 2016, Proceedings of the National Academy of Sciences.

[26]  A. Seth A predictive processing theory of sensorimotor contingencies: Explaining the puzzle of perceptual presence and its absence in synesthesia , 2014, Cognitive neuroscience.

[27]  Sven J. Dickinson,et al.  Local contour symmetry facilitates scene categorization , 2019, Cognition.

[28]  Richard McElreath,et al.  Statistical Rethinking: A Bayesian Course with Examples in R and Stan , 2015 .

[29]  Ronald A. Rensink,et al.  TO SEE OR NOT TO SEE: The Need for Attention to Perceive Changes in Scenes , 1997 .

[30]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[31]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[32]  D. Pelli,et al.  The uncrowded window of object recognition , 2008, Nature Neuroscience.

[33]  A. Watson A formula for human retinal ganglion cell receptive field density as a function of visual field location. , 2014, Journal of vision.

[34]  Ned Block,et al.  Seeing and Windows of Integration , 2013 .

[35]  Jie Huang,et al.  Cube search, revisited. , 2015, Journal of vision.

[36]  F. Jäkel,et al.  An overview of quantitative approaches in Gestalt perception , 2016, Vision Research.

[37]  Michael S. Landy,et al.  Texture analysis and perception , 2013 .

[38]  D. Dacey,et al.  Dendritic field size and morphology of midget and parasol ganglion cells of the human retina. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[39]  H. Komatsu,et al.  Image statistics underlying natural texture selectivity of neurons in macaque V4 , 2014, Proceedings of the National Academy of Sciences.

[40]  Dennis M. Levi,et al.  Crowding in Peripheral Vision: Why Bigger Is Better , 2009, Current Biology.

[41]  G. Westheimer,et al.  Global stimulus configuration modulates crowding. , 2009, Journal of vision.

[42]  B J Craven A table ofd′ forM-alternative odd-man-out forced-choice procedures , 1992, Perception & psychophysics.

[43]  Edward H. Adelson,et al.  On seeing stuff: the perception of materials by humans and machines , 2001, IS&T/SPIE Electronic Imaging.

[44]  B. Balas Contrast Negation and Texture Synthesis Differentially Disrupt Natural Texture Appearance , 2012, Front. Psychology.

[45]  D. Pelli,et al.  The Bouma law of crowding, revised: critical spacing is equal across parts, not objects. , 2014, Journal of vision.

[46]  Matthias Bethge,et al.  Testing models of peripheral encoding using metamerism in an oddity paradigm. , 2016, Journal of vision.

[47]  R. Rosenholtz Capabilities and Limitations of Peripheral Vision. , 2016, Annual review of vision science.

[48]  M. A. Goodale,et al.  What is the best fixation target? The effect of target shape on stability of fixational eye movements , 2013, Vision Research.

[49]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[50]  Eero P. Simoncelli,et al.  The radial and tangential extent of spatial metamers , 2013 .

[51]  R. Rosenholtz,et al.  A summary-statistic representation in peripheral vision explains visual crowding. , 2009, Journal of vision.

[52]  Christian N L Olivers,et al.  Evolving the Keys to Visual Crowding , 2017, Journal of experimental psychology. Human perception and performance.

[53]  Denis G. Pelli,et al.  ECVP '07 Abstracts , 2007, Perception.

[54]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[55]  D. Ariely Seeing Sets: Representation by Statistical Properties , 2001, Psychological science.

[56]  Honghua Chang,et al.  Search performance is better predicted by tileability than presence of a unique basic feature , 2016, Journal of vision.

[57]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[58]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[59]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[60]  L. Croner,et al.  Receptive fields of P and M ganglion cells across the primate retina , 1995, Vision Research.

[61]  Miguel P. Eckstein,et al.  Towards Metamerism via Foveated Style Transfer , 2017, ICLR.

[62]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[63]  Peter J Bex,et al.  (In) sensitivity to spatial distortion in natural scenes. , 2010, Journal of vision.

[64]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[65]  Jason M Haberman,et al.  From Textures to Crowds : Multiple Levels of Summary Statistical Perception , 2017 .

[66]  Won Mok Shim,et al.  Supercrowding: weakly masking a target expands the range of crowding. , 2009, Journal of vision.

[67]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[68]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[69]  Christoph Teufel,et al.  Prior object-knowledge sharpens properties of early visual feature-detectors , 2018, Scientific Reports.

[70]  Krista A. Ehinger,et al.  Rethinking the Role of Top-Down Attention in Vision: Effects Attributable to a Lossy Representation in Peripheral Vision , 2011, Front. Psychology.

[71]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[72]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[73]  Gilbert Ritschard,et al.  Coefficient-wise tree-based varying coefficient regression with vcrpart , 2017 .

[74]  L. Chalupa,et al.  The new visual neurosciences , 2014 .

[75]  Michael A. Cohen,et al.  Mid-level perceptual features distinguish objects of different real-world sizes. , 2016, Journal of experimental psychology. General.

[76]  Eero P. Simoncelli,et al.  A functional and perceptual signature of the second visual area in primates , 2013, Nature Neuroscience.

[77]  Frans W Cornelissen,et al.  The Eyelink Toolbox: Eye tracking with MATLAB and the Psychophysics Toolbox , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[78]  Leon A. Gatys,et al.  A parametric texture model based on deep convolutional features closely matches texture appearance for humans , 2017, bioRxiv.

[79]  J. Koenderink,et al.  Eidolons: Novel stimuli for vision research. , 2017, Journal of vision.

[80]  J. Cass,et al.  Dissociable effects of attention and crowding on orientation averaging. , 2009, Journal of vision.

[81]  R. Rosenholtz,et al.  A summary statistic representation in peripheral vision explains visual search. , 2009, Journal of vision.

[82]  Matteo Valsecchi,et al.  Prediction shapes peripheral appearance. , 2018, Journal of vision.

[83]  Jerome Y. Lettvin,et al.  On Seeing Sidelong , 1976 .

[84]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[85]  J. Lund,et al.  Compulsory averaging of crowded orientation signals in human vision , 2001, Nature Neuroscience.

[86]  Gregory Francis,et al.  Neural Dynamics of Grouping and Segmentation Explain Properties of Visual Crowding , 2017, Psychological review.

[87]  Krista A. Ehinger,et al.  A general account of peripheral encoding also predicts scene perception performance. , 2016, Journal of vision.

[88]  M. Herzog,et al.  Crowding, grouping, and object recognition: A matter of appearance. , 2015, Journal of vision.

[89]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[90]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[91]  Bilge Sayim,et al.  Grouping, pooling, and when bigger is better in visual crowding. , 2012, Journal of vision.

[92]  Michael S. Landy,et al.  Differential effects of exogenous and endogenous attention on second-order texture contrast sensitivity , 2012 .

[93]  H. BOUMA,et al.  Interaction Effects in Parafoveal Letter Recognition , 1970, Nature.

[94]  Paul-Christian Bürkner,et al.  Advanced Bayesian Multilevel Modeling with the R Package brms , 2017, R J..

[95]  Eero P. Simoncelli,et al.  Representation of Naturalistic Image Structure in the Primate Visual Cortex. , 2014, Cold Spring Harbor symposia on quantitative biology.

[96]  Yihui Xie,et al.  knitr: A Comprehensive Tool for Reproducible Research in R , 2018, Implementing Reproducible Research.

[97]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[98]  Felipe De Brigard,et al.  The Role of Attention in Conscious Recollection , 2012, Front. Psychology.

[99]  Michael H. Herzog,et al.  Visual crowding illustrates the inadequacy of local vs. global and feedforward vs. feedback distinctions in modeling visual perception , 2014, Front. Psychol..

[100]  D. Whitney,et al.  Object-level visual information gets through the bottleneck of crowding. , 2011, Journal of neurophysiology.

[101]  S. Dakin,et al.  Context influences contour integration. , 2009, Journal of vision.

[102]  Ronald A. Rensink,et al.  Change-blindness as a result of ‘mudsplashes’ , 1999, Nature.

[103]  Lester C. Los The role of higher order image statistics in masking scene gist recognition , 2010 .

[104]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[105]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.