Do Deep Neural Networks Suffer from Crowding?

Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks (DNNs) for object recognition. We analyze both deep convolutional neural networks (DCNNs) as well as an extension of DCNNs that are multi-scale and that change the receptive field size of the convolution filters with their position in the image. The latter networks, that we call eccentricity-dependent, have been proposed for modeling the feedforward path of the primate visual cortex. Our results reveal that the eccentricity-dependent model, trained on target objects in isolation, can recognize such targets in the presence of flankers, if the targets are near the center of the image, whereas DCNNs cannot. Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are. We find that visual similarity between the target and flankers also plays a role and that pooling in early layers of the network leads to more crowding. Additionally, we show that incorporating flankers into the images of the training set for learning the DNNs does not lead to robustness against configurations not seen at training.

[1]  William P. Banks,et al.  The asymmetry of lateral interference in visual letter identification , 1977 .

[2]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  R. Rosenholtz,et al.  Pooling of continuous features provides a unifying account of crowding , 2016, Journal of vision.

[5]  Qi Zhao,et al.  Foveation-based Mechanisms Alleviate Adversarial Examples , 2015, ArXiv.

[6]  Bruno A. Olshausen,et al.  Emergence of foveal image sampling from learning to attend in visual scenes , 2016, ICLR.

[7]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[8]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[9]  Tomaso A. Poggio,et al.  Eccentricity Dependent Deep Neural Networks: Modeling Invariance in Human Vision , 2017, AAAI Spring Symposia.

[10]  D. Levi Crowding—An essential bottleneck for object recognition: A mini-review , 2008, Vision Research.

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[13]  D. Levi,et al.  The effect of similarity and duration on spatial interaction in peripheral vision. , 1994, Spatial vision.

[14]  R. Rosenholtz,et al.  A summary-statistic representation in peripheral vision explains visual crowding. , 2009, Journal of vision.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tatjana A. Nazir,et al.  Effects of lateral masking and spatial precueing on gap-resolution in central and peripheral vision , 1992, Vision Research.

[17]  Tomaso A. Poggio,et al.  Computational role of eccentricity dependent cortical magnification , 2014, ArXiv.

[18]  Michael H. Herzog,et al.  Cortical Dynamics of Perceptual Grouping and Segmentation: Crowding , 2016 .

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  H. BOUMA,et al.  Interaction Effects in Parafoveal Letter Recognition , 1970, Nature.

[24]  D. Levi,et al.  Visual crowding: a fundamental limit on conscious perception and object recognition , 2011, Trends in Cognitive Sciences.

[25]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[26]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[27]  R. Rosenholtz,et al.  A summary statistic representation in peripheral vision explains visual search. , 2009, Journal of vision.