Deep Neural Networks as a Computational Model for Human Shape Sensitivity

Theories of object recognition agree that shape is of primordial importance, but there is no consensus about how shape might be represented, and so far attempts to implement a model of shape perception that would work with realistic stimuli have largely failed. Recent studies suggest that state-of-the-art convolutional ‘deep’ neural networks (DNNs) capture important aspects of human object perception. We hypothesized that these successes might be partially related to a human-like representation of object shape. Here we demonstrate that sensitivity for shape features, characteristic to human and primate vision, emerges in DNNs when trained for generic object recognition from natural photographs. We show that these models explain human shape judgments for several benchmark behavioral and neural stimulus sets on which earlier models mostly failed. In particular, although never explicitly trained for such stimuli, DNNs develop acute sensitivity to minute variations in shape and to non-accidental properties that have long been implicated to form the basis for object recognition. Even more strikingly, when tested with a challenging stimulus set in which shape and category membership are dissociated, the most complex model architectures capture human shape sensitivity as well as some aspects of the category structure that emerges from human judgments. As a whole, these results indicate that convolutional neural networks not only learn physically correct representations of object categories but also develop perceptually accurate representational spaces of shapes. An even more complete model of human object representations might be in sight by training deep architectures for multiple tasks, which is so characteristic in human development.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Dirk B Walther,et al.  Nonaccidental Properties Underlie Human Categorization of Complex Natural Scenes , 2014, Psychological science.

[3]  I. Biederman,et al.  Recognizing depth-rotated objects: evidence and conditions for three-dimensional viewpoint invariance. , 1993, Journal of experimental psychology. Human perception and performance.

[4]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[5]  N. Kanwisher,et al.  Discrimination Training Alters Object Representations in Human Extrastriate Cortex , 2006, The Journal of Neuroscience.

[6]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..

[7]  Subhransu Maji,et al.  Deep convolutional filter banks for texture recognition and segmentation , 2014, ArXiv.

[8]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[9]  Keiji Tanaka,et al.  Inferotemporal cortex and object vision. , 1996, Annual review of neuroscience.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  I. Biederman,et al.  Shape Tuning in Macaque Inferior Temporal Cortex , 2003, The Journal of Neuroscience.

[13]  M J Tarr,et al.  Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein (1993). , 1995, Journal of experimental psychology. Human perception and performance.

[14]  Johan Wagemans,et al.  Development of differential sensitivity for shape changes resulting from linear and nonlinear planar transformations , 2011, i-Perception.

[15]  Johan Wagemans,et al.  Infants and toddlers show enlarged visual sensitivity to nonaccidental compared with metric shape changes , 2010, i-Perception.

[16]  J. G. Snodgrass,et al.  A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. , 1980, Journal of experimental psychology. Human learning and memory.

[17]  Edward A. Wasserman,et al.  Pigeons and humans are more sensitive to nonaccidental than to metric changes in visual objects , 2008, Behavioural Processes.

[18]  Michael J. Tarr Is human object recognition better described by geon structural description or by multiple views , 1995 .

[19]  Jonas Kubilius,et al.  A framework for streamlining research workflow in neuroscience and psychology , 2014, Front. Neuroinform..

[20]  I. Biederman,et al.  Recognizing depth-rotated objects: Evidence and conditions for three-dimensional viewpoint invariance. , 1993 .

[21]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[22]  Irving Biederman,et al.  Greater sensitivity to nonaccidental than metric shape properties in preschool children , 2014, Vision Research.

[23]  Irving Biederman,et al.  The neural basis for shape preferences , 2011, Vision Research.

[24]  Michelle R. Greene,et al.  PSYCHOLOGICAL SCIENCE Research Article The Briefest of Glances The Time Course of Natural Scene Understanding , 2022 .

[25]  C. Connor,et al.  Shape representation in area V4: position-specific tuning for boundary conformation. , 2001, Journal of neurophysiology.

[26]  Nicole C. Rust,et al.  Do We Know What the Early Visual System Does? , 2005, The Journal of Neuroscience.

[27]  P. Roelfsema Cortical algorithms for perceptual grouping. , 2006, Annual review of neuroscience.

[28]  H. Neumann,et al.  The Role of Attention in Figure-Ground Segregation in Areas V1 and V4 of the Visual Cortex , 2012, Neuron.

[29]  Johan Wagemans,et al.  Perceived Shape Similarity among Unfamiliar Objects and the Organization of the Human Object Vision Pathway , 2008, The Journal of Neuroscience.

[30]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[31]  Thomas Serre,et al.  Unsupervised invariance learning of transformation sequences in a model of object recognition yields selectivity for non-accidental properties , 2015, Front. Comput. Neurosci..

[32]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[33]  Johan Wagemans,et al.  Identification of Everyday Objects on the Basis of Silhouette and Outline Versions , 2007, Perception.

[34]  E. J. Green,et al.  A Layered View of Shape Perception , 2015, The British Journal for the Philosophy of Science.

[35]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[36]  I Biederman,et al.  Seeing things from a different angle: the pigeon's recognition of single geons rotated in depth. , 2000, Journal of experimental psychology. Animal behavior processes.

[37]  B. Rossion,et al.  Revisiting Snodgrass and Vanderwart's Object Pictorial Set: The Role of Surface Detail in Basic-Level Object Recognition , 2004, Perception.

[38]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[39]  L. Chalupa,et al.  The visual neurosciences , 2004 .

[40]  I. Biederman,et al.  Tuning for shape dimensions in macaque inferior temporal cortex , 2005, The European journal of neuroscience.

[41]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[42]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Irving Biederman,et al.  Representation of Shape in Individuals From a Culture With Minimal Exposure to Regular, Simple Artifacts: Sensitivity to Nonaccidental Versus Metric Properties , 2009, Psychological science.

[44]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[45]  Irving Biederman,et al.  Greater sensitivity to nonaccidental than metric changes in the relations between simple shapes in the lateral occipital cortex , 2012, NeuroImage.

[46]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[47]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[48]  David G. Lowe,et al.  Perceptual Organization and Visual Recognition , 2012 .

[49]  I. Biederman,et al.  Inferior Temporal Neurons Show Greater Sensitivity to Nonaccidental than to Metric Shape Differences , 2001, Journal of Cognitive Neuroscience.

[50]  H. P. Op de Beeck,et al.  Dissociations and Associations between Shape and Category Representations in the Two Visual Pathways , 2015, The Journal of Neuroscience.

[51]  N. Kriegeskorte,et al.  Inverse MDS: Inferring Dissimilarity Structure from Multiple Item Arrangements , 2012, Front. Psychology.

[52]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[53]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[54]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[55]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[56]  P. Perona,et al.  Rapid natural scene categorization in the near absence of attention , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[58]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[59]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[60]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[61]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[62]  Wei Wang,et al.  Orientation-Cue Invariant Population Responses to Contrast-Modulated and Phase-Reversed Contour Stimuli in Macaque V1 and V2 , 2014, PloS one.

[63]  J. Meere The role of attention. , 2002 .

[64]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[65]  J. R. Pomerantz,et al.  A century of Gestalt psychology in visual perception: II. Conceptual and theoretical foundations. , 2012, Psychological bulletin.

[66]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[67]  Johan Wagemans,et al.  A conceptual framework of computations in mid-level vision , 2014, Front. Comput. Neurosci..

[68]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[69]  Irving Biederman,et al.  Sensitivity to nonaccidental properties across various shape dimensions , 2012, Vision Research.

[70]  T. Poggio,et al.  How Visual Cortex Recognizes Objects: The Tale of the Standard Model , 2002 .

[71]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[72]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[73]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.