Challenge of Spatial Cognition for Deep Learning

Given the success of the deep convolutional neural networks (DCNNs) in applications of visual recognition and classification, it would be tantalizing to test if DCNNs can also learn spatial concepts, such as straightness, convexity, left/right, front/back, relative size, aspect ratio, polygons, etc., from varied visual examples of these concepts that are simple and yet vital for spatial reasoning. Much to our dismay, extensive experiments of the type of cognitive psychology demonstrate that the data-driven deep learning (DL) cannot see through superficial variations in visual representations and grasp the spatial concept in abstraction. The root cause of failure turns out to be the learning methodology, not the computational model of the neural network itself. By incorporating task-specific convolutional kernels, we are able to construct DCNNs for spatial cognition tasks that can generalize to input images not drawn from the same distribution of the training set. This work raises a precaution that without manually-incorporated priors or features DCCNs may fail spatial cognitive tasks at rudimentary level.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  E. Miller,et al.  Coding of Cognitive Magnitude Compressed Scaling of Numerical Information in the Primate Prefrontal Cortex , 2003, Neuron.

[4]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[5]  Terry Winograd,et al.  Understanding natural language , 1974 .

[6]  D. Burr,et al.  A Visual Sense of Number , 2007, Current Biology.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Xi Zhang,et al.  Cognitive Deficit of Deep Learning in Numerosity , 2018, AAAI.

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jun Du,et al.  Two Is Harder To Recognize Than Tom: the Challenge of Visual Numerosity for Deep Learning , 2018, ArXiv.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Fei Xu,et al.  Numerosity discrimination in infants: Evidence for two systems of representations , 2003, Cognition.

[16]  R. Passingham The hippocampus as a cognitive map J. O'Keefe & L. Nadel, Oxford University Press, Oxford (1978). 570 pp., £25.00 , 1979, Neuroscience.

[17]  H S Terrace,et al.  Ordering of the numerosities 1 to 9 by monkeys. , 1998, Science.

[18]  N Burgess,et al.  Place cells, navigational accuracy, and the human hippocampus. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[19]  S. Dehaene,et al.  The Number Sense: How the Mind Creates Mathematics. , 1998 .

[20]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  B. P. Klein,et al.  Topographic Representation of Numerosity in the Human Parietal Cortex , 2013, Science.

[22]  Samuel Ritter,et al.  Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[23]  Xiaogang Wang,et al.  DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.