Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models

One key ability of human brain is invariant object recognition, which refers to rapid and accurate recognition of objects in the presence of variations such as size, rotation and position. Despite decades of research into the topic, it remains unknown how the brain constructs invariant representations of objects. Providing brain-plausible object representations and reaching human-level accuracy in recognition, hierarchical models of human vision have suggested that, human brain implements similar feed-forward operations to obtain invariant representations. However, conducting two psychophysical object recognition experiments on humans with systematically controlled variations of objects, we observed that humans relied on specific (diagnostic) object regions for accurate recognition which remained relatively consistent (invariant) across variations; but feed-forward feature-extraction models selected view-specific (non-invariant) features across variations. This suggests that models can develop different strategies, but reach human-level recognition performance. Moreover, human individuals largely disagreed on their diagnostic features and flexibly shifted their feature extraction strategy from view-invariant to view-specific when objects became more similar. This implies that, even in rapid object recognition, rather than a set of feed-forward mechanisms which extract diagnostic features from objects in a hard-wired fashion, the bottom-up visual pathways receive, through top-down connections, task-related information possibly processed in prefrontal cortex.

[1]  Nikolaus Kriegeskorte,et al.  Recurrent convolutional neural networks: a better model of biological object recognition under occlusion , 2017 .

[2]  S. P. Arun,et al.  Do Computational Models Differ Systematically from Human Object Perception? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kalanit Grill-Spector,et al.  Task alters category representations in prefrontal but not high-level visual cortex , 2017, NeuroImage.

[4]  David J. Freedman,et al.  Preferential Encoding of Visual Categories in Parietal Cortex Compared to Prefrontal Cortex , 2011, Nature Neuroscience.

[5]  David C. Knill,et al.  Object classification for human and ideal observers , 1995, Vision Research.

[6]  T. Poggio,et al.  What and where: A Bayesian inference theory of attention , 2010, Vision Research.

[7]  Tomoyasu Horikawa,et al.  Generic decoding of seen and imagined objects using hierarchical visual features , 2015, Nature Communications.

[8]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[9]  Nicole C. Rust,et al.  Selectivity and Tolerance (“Invariance”) Both Increase as Visual Information Propagates from Cortical Area V4 to IT , 2010, The Journal of Neuroscience.

[10]  Reza Ebrahimpour,et al.  Feedforward object-vision models only tolerate small image variations compared to human , 2014, Front. Comput. Neurosci..

[11]  Irving Biederman,et al.  Sensitivity to nonaccidental properties across various shape dimensions , 2012, Vision Research.

[12]  Manfred Fahle,et al.  Ultra Rapid Object Categorization: Effects of Level, Animacy and Context , 2013, PloS one.

[13]  Shimon Ullman,et al.  Atoms of recognition in human and computer vision , 2016, Proceedings of the National Academy of Sciences.

[14]  Davide Zoccolan,et al.  Multifeatural Shape Processing in Rats Engaged in Invariant Visual Object Recognition , 2013, The Journal of Neuroscience.

[15]  Rufin VanRullen,et al.  The power of the feed-forward sweep , 2008, Advances in cognitive psychology.

[16]  Kristina J. Nielsen,et al.  Discrimination Strategies of Humans and Rhesus Monkeys for Complex Visual Displays , 2006, Current Biology.

[17]  Frédéric Gosselin,et al.  Bubbles: a technique to reveal the use of information in recognition tasks , 2001, Vision Research.

[18]  Nasour Bagheri,et al.  Hard-wired feed-forward visual mechanisms of the brain compensate for affine variations in object recognition , 2017, Neuroscience.

[19]  Carlo Baldassi,et al.  Shape Similarity, Better than Semantic Membership, Accounts for the Structure of Visual Object Representations in a Population of Monkey Inferotemporal Neurons , 2013, PLoS Comput. Biol..

[20]  Hans P. Op de Beeck,et al.  A Multivariate Approach Reveals the Behavioral Templates Underlying Visual Discrimination in Rats , 2012, Current Biology.

[21]  Blake W. Johnson,et al.  A high density ERP comparison of mental rotation and mental size transformation , 2003, Brain and Cognition.

[22]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[23]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[24]  Ha Hong,et al.  Explicit information for category-orthogonal object properties increases along the ventral stream , 2016, Nature Neuroscience.

[25]  K. Fujii,et al.  Visualization for the analysis of fluid motion , 2005, J. Vis..

[26]  P. Schyns,et al.  Nonaccidental Properties Underlie Shape Recognition in Mammalian and Nonmammalian Vision , 2007, Current Biology.

[27]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[28]  Sina Salehi,et al.  Low dimensional representation of face space by face‐selective inferior temporal neurons , 2017, The European journal of neuroscience.

[29]  Elias B. Issa,et al.  Precedence of the Eye Region in Neural Processing of Faces , 2012, The Journal of Neuroscience.

[30]  A. Norcia,et al.  A Representational Similarity Analysis of the Dynamics of Object Processing Using Single-Trial EEG Classification , 2015, PloS one.

[31]  Kristina J. Nielsen,et al.  Object features used by humans and monkeys to identify rotated shapes. , 2008, Journal of vision.

[32]  Nasour Bagheri,et al.  Average activity, but not variability, is the dominant factor in the representation of object categories in the brain , 2017, Neuroscience.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[35]  Davide Zoccolan,et al.  Object similarity affects the perceptual strategy underlying invariant visual object recognition in rats , 2015, Front. Neural Circuits.

[36]  Frederick A. A. Kingdom,et al.  Shape recognition: convexities, concavities and things in between , 2015, Scientific Reports.

[37]  K. Grill-Spector,et al.  The functional architecture of the ventral temporal cortex and its role in categorization , 2014, Nature Reviews Neuroscience.

[38]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[39]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[40]  Michael J. Tarr Is human object recognition better described by geon structural description or by multiple views , 1995 .

[41]  Caroline Blais,et al.  The spatio-temporal dynamics of visual letter recognition , 2009, Cognitive neuropsychology.

[42]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[43]  J. M. Hupé,et al.  Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons , 1998, Nature.

[44]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Joseph R. Madsen,et al.  Spatiotemporal Dynamics Underlying Object Completion in Human Ventral Visual Cortex , 2014, Neuron.

[46]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[47]  Thomas A. Carlson,et al.  Representational dynamics of object recognition: Feedforward and feedback information flows , 2016, NeuroImage.

[48]  David Cox,et al.  Recurrent computations for visual pattern completion , 2017, Proceedings of the National Academy of Sciences.

[49]  Wendy L. Braje,et al.  Human efficiency for recognizing 3-D objects in luminance noise , 1995, Vision Research.

[50]  Ting Li,et al.  Comparing machines and humans on a visual categorization test , 2011, Proceedings of the National Academy of Sciences.

[51]  Tim Curran,et al.  The Limits of Feedforward Vision: Recurrent Processing Promotes Robust Object Recognition when Objects Are Degraded , 2012, Journal of Cognitive Neuroscience.

[52]  Heinrich H Bülthoff,et al.  Image-based object recognition in man, monkey and machine , 1998, Cognition.

[53]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Thomas Serre,et al.  Explaining the Timing of Natural Scene Understanding with a Computational Model of Perceptual Categorization , 2015, PLoS Comput. Biol..

[55]  David J. Jilk,et al.  Recurrent Processing during Object Recognition , 2011, Frontiers in Psychology.

[56]  Isabel Gauthier,et al.  Three-dimensional object recognition is viewpoint dependent , 1998, Nature Neuroscience.

[57]  P. Schyns,et al.  Non-accidental properties underlie shape recognition in mammalian and non-mammalian vision , 2007 .

[58]  P. Milner A model for visual shape recognition. , 1974, Psychological review.

[59]  Andrew Faulkner,et al.  Vividness of Visual Imagery and Incidental Recall of Verbal Cues, When Phenomenological Availability Reflects Long-Term Memory Accessibility , 2013, Front. Psychology.

[60]  Shimon Ullman,et al.  Basic-level categorization of intermediate complexity fragments reveals top-down effects of expertise in visual perception. , 2011, Journal of vision.

[61]  Gabriel Kreiman,et al.  A role for recurrent processing in object completion: neurophysiological, psychophysical and computational"evidence , 2014, 1409.2942.

[62]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[63]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[64]  Matthias Bethge,et al.  Comparing deep neural networks against humans: object recognition when the signal gets weaker , 2017, ArXiv.

[65]  Jesper Mogensen,et al.  Place learning and object recognition by rats subjected to transection of the fimbria-fornix and/or ablation of the prefrontal cortex , 2004, Brain Research Bulletin.

[66]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.