Evaluating the progress of deep learning for visual relational concepts

Convolutional neural networks have become the state-of-the-art method for image classification in the last 10 years. Despite the fact that they achieve superhuman classification accuracy on many popular datasets, they often perform much worse on more abstract image classification tasks. We will show that these difficult tasks are linked to relational concepts from cognitive psychology and that despite progress over the last few years, such relational reasoning tasks still remain difficult for current neural network architectures. We will review deep learning research that is linked to relational concept learning, even if it was not originally presented from this angle. Reviewing the current literature, we will argue that some form of attention will be an important component of future systems to solve relational tasks. In addition, we will point out the shortcomings of currently used datasets, and we will recommend steps to make future datasets more relevant for testing systems on relational reasoning.

[1]  M. Srinivasan,et al.  The concepts of ‘sameness’ and ‘difference’ in an insect , 2001, Nature.

[2]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[3]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[4]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  R. Herrnstein,et al.  Complex Visual Concept in the Pigeon , 1964, Science.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Aaron P Blaisdell,et al.  Two-itemsame-different concept learning in pigeons , 2005, Learning & behavior.

[8]  Anton van den Hengel,et al.  V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices , 2019, AAAI.

[9]  Guillermo Paz-y-Miño C,et al.  Pinyon jays use transitive inference to predict social dominance , 2004, Nature.

[10]  Leyre Castro,et al.  Same-different categorization in rats. , 2012, Learning & memory.

[11]  Logan Grosenick,et al.  Fish can infer social rank by observation alone , 2007, Nature.

[12]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[13]  Thomas Serre,et al.  Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks , 2018, ICLR 2018.

[14]  Frank Jäkel,et al.  Solving Bongard Problems with a Visual Language and Pragmatic Reasoning , 2018, ArXiv.

[15]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Sebastian Stabinger,et al.  Evaluation of Deep Learning on an Abstract Image Classification Dataset , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[17]  Thomas Serre,et al.  Same-different problems strain convolutional neural networks , 2018, CogSci.

[18]  Dejan Todorovic,et al.  Gestalt principles , 2008, Scholarpedia.

[19]  Ke Wang,et al.  Automatic Generation of Raven's Progressive Matrices , 2015, IJCAI.

[20]  Claudio Gennaro,et al.  Testing Deep Neural Networks on the Same-Different Task , 2019, 2019 International Conference on Content-Based Multimedia Indexing (CBMI).

[21]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  S. MacDonald,et al.  Levels of abstraction in orangutan (Pongo abelii) categorization. , 2004, Journal of comparative psychology.

[24]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[25]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[26]  Jakub Hajič,et al.  Visual Question Answering , 2022, International Journal of Advanced Research in Science, Communication and Technology.

[27]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[29]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[30]  Ankit B. Patel,et al.  Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning , 2020, NeurIPS.

[31]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[32]  A. Wright,et al.  Same/different abstract-concept learning by pigeons. , 2006, Journal of experimental psychology. Animal behavior processes.

[33]  Quoc V. Le,et al.  Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[35]  R. Vogels Categorization of complex visual images by rhesus monkeys. Part 1: behavioural study , 1999, The European journal of neuroscience.

[36]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Magda Tsolaki,et al.  The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study , 2020, Medical Image Anal..

[39]  Justus H. Piater,et al.  25 Years of CNNs: Can We Compare to Human Abstraction Capabilities? , 2016, ICANN.

[40]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[41]  Lina J. Karam,et al.  A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[42]  T R Zentall,et al.  Pigeons can learn identity or difference, or both. , 1976, Science.

[43]  A. Wright,et al.  Mechanisms of same/different concept learning in primates and avians , 2006, Behavioural Processes.

[44]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[45]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[46]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[47]  Qi Wu,et al.  Visual question answering: A survey of methods and datasets , 2016, Comput. Vis. Image Underst..

[48]  T. Zentall,et al.  Categorization, concept learning, and behavior analysis: an introduction. , 2002, Journal of the experimental analysis of behavior.

[49]  D. Premack,et al.  Infant chimpanzees spontaneously perceive both concrete and abstract same/different relations. , 1990, Child development.

[50]  Aran Nayebi,et al.  CORnet: Modeling the Neural Mechanisms of Core Object Recognition , 2018, bioRxiv.

[51]  Geoffrey Zweig,et al.  From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Eduardo Mercado,et al.  Generalization of ‘same–different’ classification abilities in bottlenosed dolphins , 2000, Behavioural Processes.

[53]  Charles Ling,et al.  A Deeper Look at Bongard Problems , 2020, Canadian Conference on AI.

[54]  Thomas Serre,et al.  Not-So-CLEVR: learning same–different relations strains feedforward neural networks , 2018, Interface Focus.

[55]  Irene M. Pepperberg,et al.  Acquisition of the same/different concept by an African Grey parrot (Psittacus erithacus): Learning with respect to categories of color, shape, and material , 1987 .

[56]  Jonathan Schaffer,et al.  What Not to Multiply Without Necessity , 2015 .

[57]  International Conference on Content Based Multimedia Indexing (CBMI 2018) , 2018, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[58]  J. Raven,et al.  Raven Progressive Matrices , 2003 .

[59]  Douglas R. Hofstadter,et al.  Godel, Escher, Bach: An Eternal Golden Braid , 1981 .

[60]  J. Beaugrand,et al.  Coherent use of information by hens observing their former dominant defeating or being defeated by a stranger , 1996, Behavioural Processes.

[61]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[62]  Geoffrey E. Hinton,et al.  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-' Washington , D . C . , June , 1983 OPTIMAL PERCEPTUAL INFERENCE , 2011 .

[63]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[64]  E. Wasserman,et al.  Non-Similarity-Based Conceptualization in Pigeons via Secondary or Mediated Generalization , 1992 .

[65]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[66]  Charles X. Ling,et al.  Few-Shot Abstract Visual Reasoning With Spectral Features , 2019, ArXiv.

[67]  Jascha Sohl-Dickstein,et al.  Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.

[68]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[69]  Sebastian Stabinger,et al.  Auto-tuning of Deep Neural Networks by Conflicting Layer Removal , 2021, ArXiv.

[70]  A. M. Schrier,et al.  Categorization of natural stimuli by monkeys (Macaca mulatta): effects of stimulus set size and modification of exemplars. , 1987, Journal of experimental psychology. Animal behavior processes.

[71]  Michael Werman,et al.  IQ of Neural Networks , 2017, ArXiv.

[72]  S. MacDonald,et al.  Natural concepts in a juvenile gorilla (gorilla gorilla gorilla) at three levels of abstraction. , 2002, Journal of the experimental analysis of behavior.

[73]  Eduardo Alonso,et al.  Associative Learning Should Go Deep , 2017, Trends in Cognitive Sciences.

[74]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Alex Kacelnik,et al.  Ducklings imprint on the relational concept of “same or different” , 2016, Science.

[76]  Richard G. Baraniuk,et al.  Locally Competitive Algorithms for Sparse Approximation , 2007, 2007 IEEE International Conference on Image Processing.

[77]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Justus H. Piater,et al.  Learning Abstract Classes using Deep Learning , 2015, BICT.

[79]  Ting Li,et al.  Comparing machines and humans on a visual categorization test , 2011, Proceedings of the National Academy of Sciences.

[80]  Edward Kim,et al.  Deep Sparse Coding for Invariant Multimodal Halle Berry Neurons , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[81]  M. M. Bongard,et al.  Pattern Recognition , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[82]  G. J. Robertson Raven's Progressive Matrices , 2010 .

[83]  Philippe A. Chouinard,et al.  Relational concept learning in domestic dogs: Performance on a two-choice size discrimination task generalises to novel stimuli , 2017, Behavioural Processes.

[84]  Matthias Bethge,et al.  The Notorious Difficulty of Comparing Human and Machine Perception , 2020, 2019 Conference on Cognitive Computational Neuroscience.

[85]  Anton van den Hengel,et al.  Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[86]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[87]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[88]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[89]  Aran Nayebi,et al.  Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs , 2019, NeurIPS.

[90]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Alec Radford,et al.  Multimodal Neurons in Artificial Neural Networks , 2021 .

[92]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[93]  Andrew Zisserman,et al.  Perceiver: General Perception with Iterative Attention , 2021, ICML.

[94]  Sebastian Stabinger,et al.  Arguments for the Unsuitability of Convolutional Neural Networks for Non-Local Tasks , 2021, Neural Networks.