Arguments for the Unsuitability of Convolutional Neural Networks for Non-Local Tasks

Convolutional neural networks have established themselves over the past years as the state of the art method for image classification, and for many datasets, they even surpass humans in categorizing images. Unfortunately, the same architectures perform much worse when they have to compare parts of an image to each other to correctly classify this image. Until now, no well-formed theoretical argument has been presented to explain this deficiency. In this paper, we will argue that convolutional layers are of little use for such problems, since comparison tasks are global by nature, but convolutional layers are local by design. We will use this insight to reformulate a comparison task into a sorting task and use findings on sorting networks to propose a lower bound for the number of parameters a neural network needs to solve comparison tasks in a generalizable way. We will use this lower bound to argue that attention, as well as iterative/recurrent processing, is needed to prevent a combinatorial explosion.

[1]  Endre Szemerédi,et al.  Lower bounds for sorting networks , 1995, STOC '95.

[2]  Aran Nayebi,et al.  CORnet: Modeling the Neural Mechanisms of Core Object Recognition , 2018, bioRxiv.

[3]  Richard Hans Robert Hahnloser,et al.  Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  John K. Tsotsos A ‘complexity level’ analysis of immediate vision , 2004, International Journal of Computer Vision.

[6]  Thomas Serre,et al.  Not-So-CLEVR: learning same–different relations strains feedforward neural networks , 2018, Interface Focus.

[7]  Claudio Gennaro,et al.  Testing Deep Neural Networks on the Same-Different Task , 2019, 2019 International Conference on Content-Based Multimedia Indexing (CBMI).

[8]  Justus H. Piater,et al.  25 Years of CNNs: Can We Compare to Human Abstraction Capabilities? , 2016, ICANN.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Thomas Serre,et al.  Same-different problems strain convolutional neural networks , 2018, CogSci.

[12]  Ting Li,et al.  Comparing machines and humans on a visual categorization test , 2011, Proceedings of the National Academy of Sciences.

[13]  Matthias Bethge,et al.  The Notorious Difficulty of Comparing Human and Machine Perception , 2020, 2019 Conference on Cognitive Computational Neuroscience.