Exploring The Spatial Reasoning Ability of Neural Models in Human IQ Tests

Although neural models have performed impressively well on various tasks such as image recognition and question answering, their reasoning ability has been measured in only few studies. In this work, we focus on spatial reasoning and explore the spatial understanding of neural models. First, we describe the following two spatial reasoning IQ tests: rotation and shape composition. Using well-defined rules, we constructed datasets that consist of various complexity levels. We designed a variety of experiments in terms of generalization, and evaluated six different baseline models on the newly generated datasets. We provide an analysis of the results and factors that affect the generalization abilities of models. Also, we analyze how neural models solve spatial reasoning tests with visual aids. We hope that our work can encourage further research into human-level spatial reasoning and provide a new direction for future work.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Feng Gao,et al.  RAVEN: A Dataset for Relational and Analogical Visual REasoNing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  W. Marande,et al.  Mitochondrial DNA as a Genomic Jigsaw Puzzle , 2007, Science.

[4]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[5]  Xiaohua Zhai,et al.  Self-Supervised GANs via Auxiliary Rotation Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kecheng Zheng,et al.  Abstract Reasoning with Distracting Features , 2019, NeurIPS.

[8]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Ohad Ben-Shahar,et al.  From Square Pieces to Brick Walls: The Next Challenge in Solving Jigsaw Puzzles , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Shuicheng Yan,et al.  Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Fabio Maria Carlucci,et al.  Domain Generalization by Solving Jigsaw Puzzles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hongjing Lu,et al.  Deep convolutional networks do not classify based on global object shape , 2018, PLoS Comput. Biol..

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[15]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Yixin Zhu,et al.  Learning Perceptual Inference by Contrasting , 2019, NeurIPS.

[17]  Dacheng Tao,et al.  Patch Reordering: A NovelWay to Achieve Rotation and Translation Invariance in Convolutional Neural Networks , 2017, AAAI.

[18]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Mehul Bhatt,et al.  Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving , 2019, IJCAI.

[20]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Gong Cheng,et al.  RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Kerry Hart Multiple Intelligences , 1999 .

[25]  Jonas Kubilius,et al.  Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[26]  David Picard,et al.  Image Reassembly Combining Deep Learning and Shortest Path Problem , 2018, ECCV.

[27]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[28]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[29]  Maurice Weiler,et al.  Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Stacy B. Ehrlich,et al.  The importance of gesture in children's spatial reasoning. , 2006, Developmental psychology.

[31]  Dacheng Tao,et al.  Patch Reordering: A NovelWay to Achieve Rotation and Translation Invariance in Convolutional Neural Networks , 2017, AAAI.

[32]  Nathan S. Netanyahu,et al.  A Generalized Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles of Complex Types , 2014, AAAI.

[33]  Bo Du,et al.  Beyond the Patchwise Classification: Spectral-Spatial Fully Convolutional Networks for Hyperspectral Image Classification , 2020, IEEE Transactions on Big Data.

[34]  Tie-Yan Liu,et al.  Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding , 2016, EMNLP.

[35]  Anoop Cherian,et al.  DeepPermNet: Visual Permutation Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Zoe Falomir,et al.  Logical composition of qualitative shapes applied to solve spatial reasoning tests , 2018, Cognitive Systems Research.

[37]  Yuhong Guo,et al.  Domain Adaptation With Neural Embedding Matching , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Bo Du,et al.  Spectral–Spatial Unified Networks for Hyperspectral Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.