Recurrent convolutional neural networks: a better model of biological object recognition under occlusion

Feedforward neural networks provide the dominant model of how the brain performs visual object recognition. However, these networks lack the lateral and feedback connections, and the resulting recurrent neuronal dynamics, of the ventral visual pathway in the human and nonhuman primate brain. Here we investigate recurrent convolutional neural networks with bottom-up (B), lateral (L), and top-down (T) connections. Combining these types of connections yields four architectures (B, BT, BL, and BLT), which we systematically test and compare. We hypothesized that recurrent dynamics might improve recognition performance in the challenging scenario of partial occlusion. We introduce two novel occluded object recognition tasks to test the efficacy of the models, digit clutter (where multiple target digits occlude one another) and digit debris (where target digits are occluded by digit fragments). We find that recurrent neural networks outperform feedforward control models (approximately matched in parametric complexity) at recognising objects, both in the absence of occlusion and in all occlusion conditions. Recurrent neural networks are not only more neurobiologically plausible in their architecture; their dynamics also afford superior task performance. This work shows that computer vision can benefit from using recurrent convolutional architectures and suggests that the ubiquitous recurrent connections in biological brains are essential for task performance.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Tandra Ghose,et al.  Generalization between canonical and non-canonical views in object recognition. , 2013, Journal of vision.

[3]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  M. Potter Short-term conceptual memory for pictures. , 1976, Journal of experimental psychology. Human learning and memory.

[5]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[6]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[7]  David J. Jilk,et al.  Recurrent Processing during Object Recognition , 2011, Front. Psychol..

[8]  Tim Curran,et al.  The Limits of Feedforward Vision: Recurrent Processing Promotes Robust Object Recognition when Objects Are Degraded , 2012, Journal of Cognitive Neuroscience.

[9]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[10]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Doris Y. Tsao,et al.  Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Scott L. Brincat,et al.  Dynamic Shape Synthesis in Posterior Inferotemporal Cortex , 2006, Neuron.

[14]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[15]  Li Zhaoping,et al.  Border Ownership from Intracortical Interactions in Visual Area V2 , 2005, Neuron.

[16]  Ha Hong,et al.  Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[17]  Kenji Kawano,et al.  Global and fine information coded by single neurons in the temporal visual cortex , 1999, Nature.

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[20]  Ha Hong,et al.  Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance , 2015, The Journal of Neuroscience.

[21]  Joseph R. Madsen,et al.  Spatiotemporal Dynamics Underlying Object Completion in Human Ventral Visual Cortex , 2014, Neuron.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  R. von der Heydt,et al.  A neural model of figure-ground organization. , 2007, Journal of neurophysiology.

[24]  Fred Henrik Hamker,et al.  Competition improves robustness against loss of information , 2015, Front. Comput. Neurosci..

[25]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[26]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[27]  Olaf Sporns,et al.  The small world of the cerebral cortex , 2007, Neuroinformatics.

[28]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[29]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[30]  Nikola T. Markov,et al.  Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortex , 2013, The Journal of comparative neurology.

[31]  David A. Tovar,et al.  Representational dynamics of object vision: the first 1000 ms. , 2013, Journal of vision.

[32]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[33]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[34]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[35]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[36]  P. Goldman-Rakic,et al.  Preface: Cerebral Cortex Has Come of Age , 1991 .

[37]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Ko Sakai,et al.  Surrounding Suppression and Facilitation in the Determination of Border Ownership , 2006, Journal of Cognitive Neuroscience.

[39]  David J. Jilk,et al.  Early recurrent feedback facilitates visual object recognition under challenging conditions , 2014, Front. Psychol..

[40]  H. Adesnik,et al.  Lateral competition for cortical space by layer-specific horizontal circuits , 2010, Nature.

[41]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[42]  P. Fldik,et al.  The Speed of Sight , 2001, Journal of Cognitive Neuroscience.

[43]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[44]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[45]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[46]  Jeffrey S. Johnson,et al.  The recognition of partially visible natural objects in the presence and absence of their occluders , 2005, Vision Research.

[47]  L. Tyler,et al.  Predicting the Time Course of Individual Objects with MEG , 2014, Cerebral cortex.

[48]  Radoslaw Martin Cichy,et al.  Resolving human object recognition in space and time , 2014, Nature Neuroscience.

[49]  Fraser W. Smith,et al.  Nonstimulated early visual areas carry information about surrounding context , 2010, Proceedings of the National Academy of Sciences.

[50]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[51]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.