Task-Driven Convolutional Recurrent Models of the Visual System

Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas. Here we explored the role of recurrence in improving classification performance. We found that standard forms of recurrence (vanilla RNNs and LSTMs) do not perform well within deep CNNs on the ImageNet task. In contrast, novel cells that incorporated two structural features, bypassing and gating, were able to boost task accuracy substantially. We extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. Moreover, these task-optimized ConvRNNs matched the dynamics of neural activity in the primate visual system better than feedforward networks, suggesting a role for the brain's recurrent connections in performing difficult visual behaviors.

[1]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[2]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[3]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[4]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[5]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[6]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Yalda Mohsenzadeh,et al.  Beyond Core Object Recognition: Recurrent processes account for object recognition under occlusion , 2019, PLoS Comput. Biol..

[8]  G. Buzsáki,et al.  Theta Oscillations Provide Temporal Windows for Local Circuit Computation in the Entorhinal-Hippocampal Loop , 2009, Neuron.

[9]  Leon A. Gatys,et al.  Deep convolutional models improve predictions of macaque V1 responses to natural images , 2019, PLoS Comput. Biol..

[10]  Pavlo Molchanov,et al.  IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification , 2018, ICLR.

[11]  Jascha Sohl-Dickstein,et al.  Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[14]  Grace W. Lindsay Feature-based Attention in Convolutional Neural Networks , 2015, ArXiv.

[15]  W. James,et al.  The Principles of Psychology. , 1983 .

[16]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[17]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[18]  Jiri Matas,et al.  Systematic evaluation of convolution neural network advances on the Imagenet , 2017, Comput. Vis. Image Underst..

[19]  Ha Hong,et al.  Explicit information for category-orthogonal object properties increases along the ventral stream , 2016, Nature Neuroscience.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[22]  Alexander S. Ecker,et al.  One-Shot Segmentation in Clutter , 2018, ICML.

[23]  Lin Sun,et al.  Feedback Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Nando de Freitas,et al.  Cortical microcircuits as gated-recurrent neural networks , 2017, NIPS.

[25]  C. Gilbert,et al.  Top-down influences on visual processing , 2013, Nature Reviews Neuroscience.

[26]  Thomas Serre,et al.  Learning long-range spatial dependencies with horizontal gated-recurrent units , 2018, NeurIPS.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[29]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[30]  Ha Hong,et al.  Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance , 2015, The Journal of Neuroscience.

[31]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[32]  Elias B. Issa,et al.  Neural dynamics at successive stages of the ventral visual stream are consistent with hierarchical error signals , 2018, eLife.

[33]  W. Newsome,et al.  Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[34]  Yizhen Zhang,et al.  Deep Recurrent Neural Network Reveals a Hierarchy of Process Memory during Dynamic Natural Vision , 2017, bioRxiv.

[35]  Jonathon Shlens,et al.  Recurrent Segmentation for Variable Computational Budgets , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Nikolaus Kriegeskorte,et al.  Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition , 2017, bioRxiv.

[37]  James J. DiCarlo,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2018, Nature Neuroscience.

[38]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[39]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[40]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[41]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[42]  Xin Li,et al.  Learning with Rethinking: Recurrently Improving Convolutional Neural Networks through Feedback , 2017, Pattern Recognit..