Disentangling neural mechanisms for perceptual grouping

Forming perceptual groups and individuating objects in visual scenes is an essential step towards visual intelligence. This ability is thought to arise in the brain from computations implemented by bottom-up, horizontal, and top-down connections between neurons. However, the relative contributions of these connections to perceptual grouping are poorly understood. We address this question by systematically evaluating neural network architectures featuring combinations of these connections on two synthetic visual tasks, which stress low-level `gestalt' vs. high-level object cues for perceptual grouping. We show that increasing the difficulty of either task strains learning for networks that rely solely on bottom-up processing. Horizontal connections resolve this limitation on tasks with gestalt cues by supporting incremental spatial propagation of activities, whereas top-down connections rescue learning on tasks featuring object cues by propagating coarse predictions about the position of the target object. Our findings disassociate the computational roles of bottom-up, horizontal and top-down connectivity, and demonstrate how a model featuring all of these interactions can more flexibly learn to form perceptual groups.

[1]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[3]  H E Egeth,et al.  Mental curve tracing with elementary stimuli. , 1988, Journal of experimental psychology. Human perception and performance.

[4]  Eugenio Culurciello,et al.  Deep Predictive Coding Network for Object Recognition , 2018, ICML.

[5]  David J. Jilk,et al.  Early recurrent feedback facilitates visual object recognition under challenging conditions , 2014, Front. Psychol..

[6]  S Ullman,et al.  Visual curve tracing properties. , 1991, Journal of experimental psychology. Human perception and performance.

[7]  Denis G. Pelli,et al.  Attention can relieve crowding , 2010 .

[8]  H. Spekreijse,et al.  FigureGround Segregation in a Recurrent Network Architecture , 2002, Journal of Cognitive Neuroscience.

[9]  Thomas Serre,et al.  Robust neural circuit reconstruction from serial electron microscopy with convolutional recurrent networks , 2018, ArXiv.

[10]  Surya Ganguli,et al.  Task-Driven Convolutional Recurrent Models of the Visual System , 2018, NeurIPS.

[11]  Bhaskara Marthi,et al.  A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs , 2017, Science.

[12]  Eugene S. Edgington,et al.  Randomization Tests , 2011, International Encyclopedia of Statistical Science.

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  Wu Li,et al.  Adaptive shape processing in primary visual cortex , 2011, Proceedings of the National Academy of Sciences.

[15]  John A Gemmer,et al.  A retinal code for motion along the gravitational and body axes , 2017, Nature.

[16]  Thomas Serre,et al.  Complementary Surrounds Explain Diverse Contextual Phenomena Across Visual Modalities , 2018, Psychological review.

[17]  Thomas Serre,et al.  Learning long-range spatial dependencies with horizontal gated-recurrent units , 2018, NeurIPS.

[18]  D. Fitzpatrick,et al.  Orientation Selectivity and the Arrangement of Horizontal Connections in Tree Shrew Striate Cortex , 1997, The Journal of Neuroscience.

[19]  David Cox,et al.  Recurrent computations for visual pattern completion , 2017, Proceedings of the National Academy of Sciences.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  R. Zemel,et al.  Experience-Dependent Perceptual Grouping and Object-Based Attention , 2002 .

[22]  M. Farah,et al.  Is visual image segmentation a bottom-up or an interactive process? , 1997, Perception & psychophysics.

[23]  T. Wiesel,et al.  Columnar specificity of intrinsic horizontal and corticocortical connections in cat visual cortex , 1989, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[24]  P. Roelfsema,et al.  Automatic spread of attentional response modulation along Gestalt criteria in primary visual cortex , 2011, Nature Neuroscience.

[25]  Pieter R Roelfsema,et al.  PII: S0042-6989(98)00222-3 , 1998 .

[26]  Yann Ollivier,et al.  Can recurrent neural networks warp time? , 2018, ICLR.

[27]  Nikolaus Kriegeskorte,et al.  Recurrence is required to capture the representational dynamics of the human visual system , 2019, Proceedings of the National Academy of Sciences.

[28]  Jonas Kubilius,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2019, Nature Neuroscience.

[29]  C. Gilbert,et al.  Top-down influences on visual processing , 2013, Nature Reviews Neuroscience.

[30]  Denis G. Pelli,et al.  Substitution and pooling in crowding , 2011, Attention, perception & psychophysics.

[31]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[32]  P. Roelfsema,et al.  Incremental grouping of image elements in vision , 2011, Attention, perception & psychophysics.

[33]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[34]  David J. Jilk,et al.  Recurrent Processing during Object Recognition , 2011, Frontiers in Psychology.

[35]  Ting Li,et al.  Comparing machines and humans on a visual categorization test , 2011, Proceedings of the National Academy of Sciences.

[36]  Rüdiger von der Heydt,et al.  Figure-ground organization in the visual cortex: does meaning matter? , 2018, Journal of neurophysiology.

[37]  Roelfsema Pieter Cortical algorithms for perceptual grouping , 2008 .

[38]  Pieter R Roelfsema,et al.  Parallel and serial grouping of image elements in visual perception. , 2010, Journal of experimental psychology. Human perception and performance.

[39]  Harri Valpola,et al.  Tagger: Deep Unsupervised Perceptual Grouping , 2016, NIPS.

[40]  Thomas Serre,et al.  Not-So-CLEVR: learning same–different relations strains feedforward neural networks , 2018, Interface Focus.

[41]  Alexander S. Ecker,et al.  One-Shot Segmentation in Clutter , 2018, ICML.

[42]  Kyoung Mu Lee,et al.  Deeply-Recursive Convolutional Network for Image Super-Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Lin Sun,et al.  Feedback Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  W Singer,et al.  The Perceptual Grouping Criterion of Colinearity is Reflected by Anisotropies of Connections in the Primary Visual Cortex , 1997, The European journal of neuroscience.

[45]  Jean Bennett,et al.  Lateral Connectivity and Contextual Interactions in Macaque Primary Visual Cortex , 2002, Neuron.

[46]  Michael H. Herzog,et al.  Why vision is not both hierarchical and feedforward , 2014, Front. Comput. Neurosci..

[47]  H. C. Nothdurft,et al.  Texture segmentation and pop-out from orientation contrast , 1991, Vision Research.

[48]  Thomas Serre,et al.  How Deep is the Feature Analysis underlying Rapid Visual Categorization? , 2016, NIPS.

[49]  C. Spearman CORRELATION CALCULATED FROM FAULTY DATA , 1910 .

[50]  Nikolaus Kriegeskorte,et al.  Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition , 2017, bioRxiv.

[51]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[52]  Thomas Serre,et al.  Recurrent neural circuits for contour detection , 2020, ICLR.

[53]  Victor A. F. Lamme,et al.  Feedforward, horizontal, and feedback processing in the visual cortex , 1998, Current Opinion in Neurobiology.

[54]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[55]  Pouya Bashivan,et al.  Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks , 2018 .

[56]  Junkyung Kim,et al.  Sample-efficient image segmentation through recurrence , 2018 .

[57]  Scott O. Murray,et al.  Perceptual grouping and the interactions between visual cortical areas , 2004, Neural Networks.

[58]  Thomas Serre,et al.  Neuronal Synchrony in Complex-Valued Deep Networks , 2013, ICLR.

[59]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[60]  Paul Schrater,et al.  Shape perception reduces activity in human primary visual cortex , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[61]  H. Spekreijse,et al.  A gradual spread of attention during mental curve tracing. , 2003, Perception & psychophysics.

[62]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[63]  R. O’Reilly,et al.  Figure-ground organization and object recognition processes: an interactive account. , 1998, Journal of experimental psychology. Human perception and performance.

[64]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[65]  D. Pelli,et al.  Crowding is unlike ordinary masking: distinguishing feature integration from detection. , 2004, Journal of vision.

[66]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[67]  S. Ullman,et al.  Curve tracing: A possible basic operation in the perception of spatial relations , 1986, Memory & cognition.

[68]  V. Lamme,et al.  The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[69]  H. Sebastian Seung,et al.  Superhuman Accuracy on the SNEMI3D Connectomics Challenge , 2017, ArXiv.

[70]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[71]  S. Ullman Visual routines , 1984, Cognition.

[72]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[73]  E. Rolls High-level vision: Object recognition and visual cognition, Shimon Ullman. MIT Press, Bradford (1996), ISBN 0 262 21013 4 , 1997 .