Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior

Non-recurrent deep convolutional neural networks (CNNs) are currently the best at modeling core object recognition, a behavior that is supported by the densely recurrent primate ventral stream, culminating in the inferior temporal (IT) cortex. If recurrence is critical to this behavior, then primates should outperform feedforward-only deep CNNs for images that require additional recurrent processing beyond the feedforward IT response. Here we first used behavioral methods to discover hundreds of these ‘challenge’ images. Second, using large-scale electrophysiology, we observed that behaviorally sufficient object identity solutions emerged ~30 ms later in the IT cortex for challenge images compared with primate performance-matched ‘control’ images. Third, these behaviorally critical late-phase IT response patterns were poorly predicted by feedforward deep CNN activations. Notably, very-deep CNNs and shallower recurrent CNNs better predicted these late IT responses, suggesting that there is a functional equivalence between additional nonlinear transformations and recurrence. Beyond arguing that recurrent circuits are critical for rapid object identification, our results provide strong constraints for future recurrent model development.Using model- and primate behavior-driven image selection with large-scale electrophysiology in monkeys performing core recognition tasks, Kar et al. provide evidence that automatically engaged recurrent circuits are critical for rapid object identification.

[1]  E. Rolls,et al.  Role of low and high spatial frequencies in the face-selective responses of neurons in the cortex in the superior temporal sulcus in the monkey , 1985, Vision Research.

[2]  K. Rockland,et al.  Terminal arbors of individual “Feedback” axons projecting from area V2 to V1 in the macaque monkey: A study using immunohistochemistry of anterogradely transported Phaseolus vulgaris‐leucoagglutinin , 1989, The Journal of comparative neurology.

[3]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[4]  K. Rockland,et al.  Direct temporal-occipital feedback connections to striate cortex (V1) in the macaque monkey. , 1994, Cerebral cortex.

[5]  M. Tovée Neuronal Processing: How fast is the speed of thought? , 1994, Current Biology.

[6]  K. Rockland,et al.  Divergent feedback connections from areas V4 and TEO in the macaque , 1994, Visual Neuroscience.

[7]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[8]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[9]  C. Ortiz de Solórzano,et al.  Evaluation of autofocus functions in molecular cytogenetic analysis , 1997, Journal of microscopy.

[10]  Kenji Kawano,et al.  Global and fine information coded by single neurons in the temporal visual cortex , 1999, Nature.

[11]  Y. Miyashita,et al.  Top-down signal from prefrontal cortex in executive control of memory retrieval , 1999, Nature.

[12]  V. Lamme,et al.  The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[13]  R. Vogels,et al.  Spatial sensitivity of macaque inferior temporal neurons , 2000, The Journal of comparative neurology.

[14]  Tomaso Poggio,et al.  Models of object recognition , 2000, Nature Neuroscience.

[15]  H. Spekreijse,et al.  Masking Interrupts Figure-Ground Signals in V1 , 2002, Journal of Cognitive Neuroscience.

[16]  M. Behrmann,et al.  Impact of learning on representation of parts and wholes in monkey inferotemporal cortex , 2002, Nature Neuroscience.

[17]  Pietro Perona,et al.  Selective visual attention enables learning and recognition of multiple objects in cluttered scenes , 2005, Comput. Vis. Image Underst..

[18]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[19]  E. Halgren,et al.  Top-down facilitation of visual recognition. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[20]  P. Roelfsema Cortical algorithms for perceptual grouping. , 2006, Annual review of neuroscience.

[21]  J. Maunsell,et al.  Feature-based attention in visual cortex , 2006, Trends in Neurosciences.

[22]  Johannes J. Fahrenfort,et al.  Masking Disrupts Reentrant Processing in Human Visual Cortex , 2007, Journal of Cognitive Neuroscience.

[23]  Yuanzhen Li,et al.  Measuring visual clutter. , 2007, Journal of vision.

[24]  David J. Freedman,et al.  Dynamic population coding of category information in inferior temporal and prefrontal cortex. , 2008, Journal of neurophysiology.

[25]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[26]  Carol A. Seger,et al.  How do the basal ganglia contribute to categorization? Their roles in generalization, response selection, and learning via feedback , 2008, Neuroscience & Biobehavioral Reviews.

[27]  R. Desimone,et al.  A backward progression of attentional effects in the ventral stream , 2009, Proceedings of the National Academy of Sciences.

[28]  M. Oram Contrast induced changes in response latency depend on stimulus specificity , 2010, Journal of Physiology-Paris.

[29]  Greg O. Horne,et al.  Controlling low-level image properties: The SHINE toolbox , 2010, Behavior research methods.

[30]  W. Singer,et al.  Synchrony Makes Neurons Fire in Sequence, and Stimulus Properties Determine Who Is Ahead , 2011, The Journal of Neuroscience.

[31]  H. Neumann,et al.  The Role of Attention in Figure-Ground Segregation in Areas V1 and V4 of the Visual Cortex , 2012, Neuron.

[32]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  P. Roelfsema,et al.  Alpha and gamma oscillations characterize feedback and feedforward processing in monkey visual cortex , 2014, Proceedings of the National Academy of Sciences.

[35]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[36]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[37]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[38]  David J. Freedman,et al.  Task Dependence of Visual and Category Representations in Prefrontal and Inferior Temporal Cortices , 2014, The Journal of Neuroscience.

[39]  B. Stojanoski,et al.  Time to wave good-bye to phase scrambling: creating controlled scrambled images using diffeomorphic transformations. , 2014, Journal of vision.

[40]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[41]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[42]  James J. DiCarlo,et al.  Comparison of Object Recognition Behavior in Human and Monkey , 2014, Journal of Neuroscience.

[43]  S. Dehaene,et al.  Distinct cortical codes and temporal dynamics for conscious and unconscious percepts , 2015, eLife.

[44]  Ha Hong,et al.  Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance , 2015, The Journal of Neuroscience.

[45]  Marcel A J van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2015, The Journal of Neuroscience.

[46]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[47]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[48]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[49]  J. DiCarlo,et al.  Comparison of Object Recognition Behavior in Human and Monkey , 2014, The Journal of Neuroscience.

[50]  N. P. Bichot,et al.  A Source for Feature-Based Attention in the Prefrontal Cortex , 2015, Neuron.

[51]  Ha Hong,et al.  Explicit information for category-orthogonal object properties increases along the ventral stream , 2016, Nature Neuroscience.

[52]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Danique Jeurissen,et al.  Serial grouping of 2D-image regions with object-based attention in humans , 2016, eLife.

[55]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[56]  Keiji Tanaka,et al.  Neural representation for object recognition in inferotemporal cortex , 2016, Current Opinion in Neurobiology.

[57]  James J DiCarlo,et al.  Eight open questions in the computational modeling of higher sensory cortex , 2016, Current Opinion in Neurobiology.

[58]  Nikolaus Kriegeskorte,et al.  Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition , 2017, bioRxiv.

[59]  Eric T. Shea-Brown,et al.  Dynamic representation of partially occluded objects in primate prefrontal and visual cortex , 2017, eLife.

[60]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[61]  Kalanit Grill-Spector,et al.  Task alters category representations in prefrontal but not high-level visual cortex , 2017, NeuroImage.

[62]  Matthias Bethge,et al.  Comparing deep neural networks against humans: object recognition when the signal gets weaker , 2017, ArXiv.

[63]  David Cox,et al.  Recurrent computations for visual pattern completion , 2017, Proceedings of the National Academy of Sciences.

[64]  Jonas Kubilius,et al.  Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[65]  Aran Nayebi,et al.  CORnet: Modeling the Neural Mechanisms of Core Object Recognition , 2018, bioRxiv.

[66]  James J DiCarlo,et al.  Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks , 2018, The Journal of Neuroscience.

[67]  Jascha Sohl-Dickstein,et al.  Adversarial Examples that Fool both Human and Computer Vision , 2018, ArXiv.