论文信息 - Two Is Harder To Recognize Than Tom: the Challenge of Visual Numerosity for Deep Learning

Two Is Harder To Recognize Than Tom: the Challenge of Visual Numerosity for Deep Learning

In the spirit of Turing test, we design and conduct a set of visual numerosity experiments with deep neural networks. We train DCNNs with a large number of sample images that are varied visual representations of small natural numbers, towards the objective of learning numerosity perception. Numerosity perception, or the number sense, is a cognitive construct so primary and so critical to the survival and well-being of our species that is considered and proven to be innate to human infants, and it responds to visual stimuli prior to the development of any symbolic skills, language or arithmetic. Somewhat surprisingly, in our experiments, even with strong supervision, DCNNs cannot see through superficial variations in visual representations and distill the abstract notion of natural number, a task that children perform with high accuracy and confidence. DCNNs are apparently easy to be confused by geometric variations and fail to grasp the topological essence in numerosity. The failures of DCNNs in the proposed cognition experiments also expose their overreliance on sample statistics at the expense of image semantics. Our findings are, we believe, significant and thought-provoking in the interests of AI research, because visual-based numerosity is a benchmark of minimum sort for human intelligence.

[1] S. Dehaene,et al. Cultural Recycling of Cortical Maps , 2007, Neuron.

[2] Xiaogang Wang,et al. DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] H S Terrace,et al. Ordering of the numerosities 1 to 9 by monkeys. , 1998, Science.

[5] Midori Tokita,et al. How might the discrepancy in the effects of perceptual variables on numerosity judgment be reconciled? , 2010, Attention, perception & psychophysics.

[6] D. Burr,et al. A Visual Sense of Number , 2007, Current Biology.

[7] Andreas Nieder,et al. A parieto-frontal network for visual numerical information in the monkey. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8] J. Krebs,et al. An introduction to behavioural ecology , 1981 .

[9] Elizabeth M. Brannon,et al. Nonverbal representations of time and number in animals and human infants. , 2003 .

[10] C. Packer,et al. Roaring and numerical assessment in contests between groups of female lions, Panthera leo , 1994, Animal Behaviour.

[11] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] C. Gallistel,et al. Nonverbal Counting in Humans: The Psychophysics of Number Representation , 1999 .

[13] Xiaoou Tang,et al. Surpassing Human-Level Face Verification Performance on LFW with GaussianFace , 2014, AAAI.

[14] E. Miller,et al. Coding of Cognitive Magnitude Compressed Scaling of Numerical Information in the Primate Prefrontal Cortex , 2003, Neuron.

[15] Philippe Pinel,et al. Tuning Curves for Approximate Numerosity in the Human Intraparietal Sulcus , 2004, Neuron.

[16] Samuel Ritter,et al. Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18] Bruce E. Lyon,et al. Egg recognition and counting reduce costs of avian conspecific brood parasitism , 2003, Nature.

[19] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[20] David Marr,et al. VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[21] Andrew Zisserman,et al. Microscopy cell counting and detection with fully convolutional regression networks , 2018, Comput. methods Biomech. Biomed. Eng. Imaging Vis..

[22] Xiaogang Wang,et al. Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] S. Dehaene,et al. The Number Sense: How the Mind Creates Mathematics. , 1998 .

[24] E. L. Kaufman,et al. The discrimination of visual number. , 1949, The American journal of psychology.

[25] B. P. Klein,et al. Topographic Representation of Numerosity in the Human Parietal Cortex , 2013, Science.

[26] E. Spelke,et al. Sources of mathematical thinking: behavioral and brain-imaging evidence. , 1999, Science.

[27] S. Carey. The Origin of Concepts , 2000 .

[28] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Gustav Theodor Fechner,et al. Elemente der Psychophysik / vol. 1 , 1889 .

[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31] N. R. Franks,et al. Chimpanzees and the mathematics of battle , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[32] Katherine D. Kinzler,et al. Core knowledge. , 2007, Developmental science.

[33] Fei Xu,et al. Numerosity discrimination in infants: Evidence for two systems of representations , 2003, Cognition.

[34] Shenghua Gao,et al. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).