Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

How do we recognize objects despite changes in their appearance? The past three decades have been witness to intense debates regarding both whether objects are encoded invariantly with respect to viewing conditions and whether specialized, separable mechanisms are used for the recognition of different object categories. We argue that such dichotomous debates ask the wrong question. Much more important is the nature of object representations: What are features that enable invariance or differential processing between categories? Although the nature of object features is still an unanswered question, new methods for connecting data to models show significant potential for helping us to better understand neural codes for objects. Most prominently, new approaches to analyzing data from functional magnetic resonance imaging, including neural decoding and representational similarity analysis, and new computational models of vision, including convolutional neural networks, have enabled a much more nuanced understanding of visual representation. Convolutional neural networks are particularly intriguing as a tool for studying biological vision in that this class of artificial vision systems, based on biologically plausible deep neural networks, exhibits visual recognition capabilities that are approaching those of human observers. As these models improve in their recognition performance, it appears that they also become more effective in predicting and accounting for neural responses in the ventral cortex. Applying these and other deep models to empirical data shows great promise for enabling future progress in the study of visual recognition.

[1]  Kalanit Grill-Spector,et al.  The representation of object viewpoint in human visual cortex , 2009, NeuroImage.

[2]  M. Tarr,et al.  Visual object recognition: do we know more now than we did 20 years ago? , 2007, Annual review of psychology.

[3]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[4]  Isabel Gauthier,et al.  Expertise with characters in alphabetic and nonalphabetic writing systems engage overlapping occipito-temporal areas , 2009, Cognitive neuropsychology.

[5]  Isabel Gauthier,et al.  The development of face expertise , 2001, Current Opinion in Neurobiology.

[6]  John T. Serences,et al.  Computational advances towards linking BOLD and behavior , 2012, Neuropsychologia.

[7]  R. Henson,et al.  Multiple levels of visual object constancy revealed by event-related fMRI of repetition priming , 2002, Nature Neuroscience.

[8]  John A. Pyles,et al.  Comparing visual representations across human fMRI and computational vision. , 2013, Journal of vision.

[9]  D. Hubel,et al.  Sequence regularity and geometry of orientation columns in the monkey striate cortex , 1974, The Journal of comparative neurology.

[10]  D. Perrett,et al.  Evidence accumulation in cell populations responsive to faces: an account of generalisation of recognition without mental transformations , 1998, Cognition.

[11]  Isabel Gauthier,et al.  Cortical Thickness in Fusiform Face Area Predicts Face and Object Recognition Performance , 2016, Journal of Cognitive Neuroscience.

[12]  E. Rolls,et al.  View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. , 1998, Cerebral cortex.

[13]  G. Winocur,et al.  What Is Special about Face Recognition? Nineteen Experiments on a Person with Visual Object Agnosia and Dyslexia but Normal Face Recognition , 1997, Journal of Cognitive Neuroscience.

[14]  Timothy J. Andrews,et al.  Differential sensitivity for viewpoint between familiar and unfamiliar faces in human visual cortex , 2008, NeuroImage.

[15]  C Blakemore,et al.  On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images , 1969, The Journal of physiology.

[16]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[17]  I. Biederman,et al.  Differing views on views: response to Hayward and Tarr (2000) , 2000, Vision Research.

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[20]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[21]  M. Tarr,et al.  Mental rotation and orientation-dependence in shape recognition , 1989, Cognitive Psychology.

[22]  Rafael Malach,et al.  Faculty Opinions recommendation of Distributed and overlapping representations of faces and objects in ventral temporal cortex. , 2002 .

[23]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[24]  A. Neuren Visual Agnosia , 1991, Neurology.

[25]  Rankin W. McGugin,et al.  Robust expertise effects in right FFA , 2014, Neuropsychologia.

[26]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  I. Gauthier,et al.  Expertise for cars and birds recruits brain areas involved in face recognition , 2000, Nature Neuroscience.

[29]  P. Goldman-Rakic,et al.  Preface: Cerebral Cortex Has Come of Age , 1991 .

[30]  F. Tong,et al.  Decoding the visual and subjective contents of the human brain , 2005, Nature Neuroscience.

[31]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[32]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[33]  Isabel Gauthier,et al.  Measuring nonvisual knowledge about object categories: The Semantic Vanderbilt Expertise Test , 2016, Behavior research methods.

[34]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[35]  Alexander Borst,et al.  How does Nature Program Neuron Types? , 2008, Front. Neurosci..

[36]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[37]  Morgan D. Barense,et al.  Conjunctive Coding of Complex Object Features. , 2016, Cerebral cortex.

[38]  Jack L. Gallant,et al.  A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain , 2012, Neuron.

[39]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[40]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[41]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[42]  M. Riesenhuber,et al.  Categorization Training Results in Shape- and Category-Selective Human Neural Plasticity , 2007, Neuron.

[43]  Michael J. Tarr,et al.  RECONSIDERING THE ROLE OF STRUCTURE IN VISION , 2006 .

[44]  Michael J. Tarr,et al.  Objects of Expertise , 2009 .

[45]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[46]  M. Tarr,et al.  FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[47]  Robert Plomin,et al.  Genetic specificity of face recognition , 2015, Proceedings of the National Academy of Sciences.

[48]  Yetta Kwailing Wong,et al.  Perceptual Expertise and Top–Down Expectation of Musical Notation Engages the Primary Visual Cortex , 2014, Journal of Cognitive Neuroscience.

[49]  H H Bülthoff,et al.  How are three-dimensional objects represented in the brain? , 1994, Cerebral cortex.

[50]  Michael J. Tarr,et al.  Task-Specific Codes for Face Recognition: How they Shape the Neural Representation of Features for Detection and Individuation , 2008, PloS one.

[51]  Irving Biederman,et al.  Human image understanding: Recent research and a theory , 1985, Comput. Vis. Graph. Image Process..

[52]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[53]  B. Duchaine,et al.  The Cambridge Car Memory Test: A task matched in format to the Cambridge Face Memory Test, with norms, reliability, sex differences, dissociations from face memory, and expertise effects , 2012, Behavior research methods.

[54]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[55]  M. Tarr,et al.  Differing views on views: comments on Biederman and Bar (1999) , 2000, Vision Research.

[56]  Rankin W. McGugin,et al.  Expertise Effects in Face-Selective Areas are Robust to Clutter and Diverted Attention, but not to Competition. , 2015, Cerebral cortex.

[57]  Bruce D. McCandliss,et al.  The visual word form area: expertise for reading in the fusiform gyrus , 2003, Trends in Cognitive Sciences.

[58]  Eric T. Carlson,et al.  A neural code for three-dimensional object shape in macaque inferotemporal cortex , 2008, Nature Neuroscience.

[59]  J. Haxby,et al.  Attribute-based neural substrates in temporal cortex for perceiving and knowing about objects , 1999, Nature Neuroscience.

[60]  Leslie G. Ungerleider,et al.  Uncovering the visual “alphabet”: Advances in our understanding of object perception , 2011, Vision Research.

[61]  R. Malach,et al.  Top-down engagement modulates the neural expressions of visual expertise. , 2010, Cerebral cortex.

[62]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Isabel Gauthier,et al.  A visual short-term memory advantage for objects of expertise. , 2009, Journal of experimental psychology. Human perception and performance.

[64]  Rankin W. McGugin,et al.  High-resolution imaging of expertise reveals reliable object selectivity in the fusiform face area related to perceptual performance , 2012, Proceedings of the National Academy of Sciences.

[65]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[66]  Rainer Goebel,et al.  Information-based functional brain mapping. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[67]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[68]  I. Gauthier,et al.  Perceptual interference supports a non-modular account of face processing , 2003, Nature Neuroscience.

[69]  Sean M. Polyn,et al.  Beyond mind-reading: multi-voxel pattern analysis of fMRI data , 2006, Trends in Cognitive Sciences.

[70]  B. Mesquita,et al.  Adjustment to Chronic Diseases and Terminal Illness Health Psychology : Psychological Adjustment to Chronic Disease , 2006 .

[71]  Alice J. O'Toole,et al.  Partially Distributed Representations of Objects and Faces in Ventral Temporal Cortex , 2005, Journal of Cognitive Neuroscience.

[72]  Rankin W. McGugin,et al.  The Vanderbilt Expertise Test reveals domain-general and domain-specific sex effects in object recognition , 2012, Vision Research.

[73]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[74]  K. Grill-Spector,et al.  High-resolution imaging reveals highly selective nonface clusters in the fusiform face area , 2006, Nature Neuroscience.

[75]  Cindy M. Bukach,et al.  Limits of generalization between categories and implications for theories of category specificity , 2010, Attention, perception & psychophysics.

[76]  R N Shepard,et al.  Multidimensional Scaling, Tree-Fitting, and Clustering , 1980, Science.

[77]  N. Kanwisher Domain specificity in face perception , 2000, Nature Neuroscience.