Recurrent Connections in the Primate Ventral Visual Stream Mediate a Tradeoff Between Task Performance and Network Size During Core Object Recognition

The computational role of the abundant feedback connections in the ventral visual stream (VVS) is unclear, enabling humans and non-human primates to effortlessly recognize objects across a multitude of viewing conditions. Prior studies have augmented feedforward convolutional neural networks (CNNs) with recurrent connections to study their role in visual processing; however, often these recurrent networks are optimized directly on neural data or the comparative metrics used are undefined for standard feedforward networks that lack these connections. In this work, we develop task-optimized convolutional recurrent (ConvRNN) network models that more correctly mimic the timing and gross neuroanatomy of the ventral pathway. Properly chosen intermediate-depth ConvRNN circuit architectures, which incorporate mechanisms of feedforward bypassing and recurrent gating, can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then develop methods that allow us to compare both CNNs and ConvRNNs to fine-grained measurements of primate categorization behavior and neural response trajectories across thousands of stimuli. We find that high performing ConvRNNs provide a better match to this data than feedforward networks of any depth, predicting the precise timings at which each stimulus is behaviorally decoded from neural activation patterns. Moreover, these ConvRNN circuits consistently produce quantitatively accurate predictions of neural dynamics from V4 and IT across the entire stimulus presentation. In fact, we find that the highest performing ConvRNNs, which best match neural and behavioral data, also achieve a strong Pareto-tradeoff between task performance and overall network size. Taken together, our results suggest the functional purpose of recurrence in the ventral pathway is to fit a high performing network in cortex, attaining computational power through temporal rather than spatial complexity.

[1]  Michael C. Frank,et al.  Unsupervised neural network models of the ventral visual stream , 2020, Proceedings of the National Academy of Sciences.

[2]  Kohitij Kar,et al.  Fast recurrent processing via ventral prefrontal cortex is needed by the primate ventral stream for robust core visual object recognition , 2020, bioRxiv.

[3]  Daniel L. K. Yamins,et al.  Two Routes to Scalable Credit Assignment without Weight Symmetry , 2020, ICML.

[4]  Nikolaus Kriegeskorte,et al.  Recurrence is required to capture the representational dynamics of the human visual system , 2019, Proceedings of the National Academy of Sciences.

[5]  Elias B. Issa,et al.  Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs , 2019, NeurIPS.

[6]  Nikolaus Kriegeskorte,et al.  Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision , 2019, bioRxiv.

[7]  Peter C. Humphreys,et al.  Deep Learning without Weight Transport , 2019, NeurIPS.

[8]  Jonas Kubilius,et al.  Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[9]  James J. DiCarlo,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2018, Nature Neuroscience.

[10]  Surya Ganguli,et al.  Task-Driven Convolutional Recurrent Models of the Visual System , 2018, NeurIPS.

[11]  Thomas Serre,et al.  Learning long-range spatial dependencies with horizontal gated-recurrent units , 2018, NeurIPS.

[12]  Reza Ebrahimpour,et al.  Beyond core object recognition: Recurrent processes account for object recognition under occlusion , 2018, bioRxiv.

[13]  Elias B. Issa,et al.  Neural dynamics at successive stages of the ventral visual stream are consistent with hierarchical error signals , 2018, eLife.

[14]  Alexander S. Ecker,et al.  One-Shot Segmentation in Clutter , 2018, ICML.

[15]  Pavlo Molchanov,et al.  IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification , 2018, ICLR.

[16]  James J DiCarlo,et al.  Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks , 2018, The Journal of Neuroscience.

[17]  Jonathon Shlens,et al.  Recurrent Segmentation for Variable Computational Budgets , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Alexander S. Ecker,et al.  Neural system identification for large populations separating "what" and "where" , 2017, NIPS.

[19]  Xin Li,et al.  Learning with Rethinking: Recurrently Improving Convolutional Neural Networks through Feedback , 2017, Pattern Recognit..

[20]  Nikolaus Kriegeskorte,et al.  Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition , 2017, bioRxiv.

[21]  Surya Ganguli,et al.  Deep Learning Models of the Retinal Response to Natural Scenes , 2017, NIPS.

[22]  Lin Sun,et al.  Feedback Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jascha Sohl-Dickstein,et al.  Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.

[24]  H. Spencer The Principles of Psychology - Vol. I , 2016 .

[25]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[27]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[28]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[31]  Grace W. Lindsay Feature-based Attention in Convolutional Neural Networks , 2015, ArXiv.

[32]  Ha Hong,et al.  Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance , 2015, The Journal of Neuroscience.

[33]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[34]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[35]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[39]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[40]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[41]  W. Newsome,et al.  Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[42]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[43]  C. Gilbert,et al.  Top-down influences on visual processing , 2013, Nature Reviews Neuroscience.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[46]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[47]  G. Buzsáki,et al.  Theta Oscillations Provide Temporal Windows for Local Circuit Computation in the Entorhinal-Hippocampal Loop , 2009, Neuron.

[48]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[49]  M. Meister,et al.  Dynamic predictive coding by the retina , 2005, Nature.

[50]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[51]  W. Newsome,et al.  Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. , 2001, Journal of neurophysiology.

[52]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[53]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[54]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[55]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .