Testing Deep Neural Networks on the Same-Different Task

Developing abstract reasoning abilities in neural networks is an important goal towards the achievement of human-like performances on many tasks. As of now, some works have tackled this problem, developing ad-hoc architectures and reaching overall good generalization performances. In this work we try to understand to what extent state-of-the-art convolutional neural networks for image classification are able to deal with a challenging abstract problem, the so-called same-different task. This problem consists in understanding if two random shapes inside the same image are the same or not. A recent work demonstrated that simple convolutional neural networks are almost unable to solve this problem. We extend their work, showing that ResNet-inspired architectures are able to learn, while VGG cannot converge. In light of this, we suppose that residual connections have some important role in the learning process, while the depth of the network seems not so relevant. In addition, we carry out some targeted tests on the converged architectures to figure out to what extent they are able to generalize to never seen patterns. However, further investigation is needed in order to understand what are the architectural peculiarities and limits as far as abstract reasoning is concerned.

[1]  David Mascharka,et al.  Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[3]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Weihong Deng,et al.  Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[8]  Justus H. Piater,et al.  25 Years of CNNs: Can We Compare to Human Abstraction Capabilities? , 2016, ICANN.

[9]  Felice Dell'Orletta,et al.  Cross-Media Learning for Image Sentiment Analysis in the Wild , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiao-Jing Wang,et al.  A dataset and architecture for visual reasoning with a working memory , 2018, ECCV.

[12]  Claudio Gennaro,et al.  Learning Relationship-Aware Visual Features , 2018, ECCV Workshops.

[13]  Jonas Kubilius,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2019, Nature Neuroscience.

[14]  Roberto Caldelli,et al.  Detecting adversarial example attacks to deep neural networks , 2017, CBMI.

[15]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Feng Gao,et al.  RAVEN: A Dataset for Relational and Analogical Visual REasoNing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Aran Nayebi,et al.  CORnet: Modeling the Neural Mechanisms of Core Object Recognition , 2018, bioRxiv.

[20]  Thomas Serre,et al.  Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks , 2018, ICLR 2018.

[21]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Ting Li,et al.  Comparing machines and humans on a visual categorization test , 2011, Proceedings of the National Academy of Sciences.

[23]  Andrea Esuli,et al.  Picture it in your mind: generating high level visual representations from textual descriptions , 2016, Information Retrieval Journal.