论文信息 - Letter perception emerges from unsupervised deep learning and recycling of natural image features

Letter perception emerges from unsupervised deep learning and recycling of natural image features

The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem1,2. Here, we present a large-scale computational model of letter recognition based on deep neural networks3,4, which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input5,6. In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition7, earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments8.Testolin et al. develop a computational model of letter perception based on deep learning and show that domain-general visual knowledge extracted from natural scenes is recycled for learning domain-specific cultural artefacts, such as printed letters.

[1] Jonathan Grainger,et al. References and Notes , 2022 .

[2] Marco Zorzi,et al. Do current connectionist learning models account for reading development in different languages? , 2004, Cognition.

[3] E. Candès,et al. Ridgelets: a key to higher-dimensional intermittency? , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[4] Jonathan Grainger,et al. A Vision of Reading , 2016, Trends in Cognitive Sciences.

[5] M. Sigman,et al. Opinion TRENDS in Cognitive Sciences Vol.9 No.7 July 2005 The neural code for written words: a proposal , 2022 .

[6] Alberto Testolin,et al. Modeling language and cognition with deep unsupervised learning: a tutorial overview , 2013, Front. Psychol..

[7] D. Pelli,et al. Measuring contrast sensitivity , 2013, Vision Research.

[8] J. Muise,et al. Alphabetic confusion: A clarification , 1985, Perception & psychophysics.

[9] S. Dehaene,et al. Cultural Recycling of Cortical Maps , 2007, Neuron.

[10] Thad A. Polk,et al. A Simple Common Contexts Explanation for the Development of Abstract Letter Identities , 1997, Neural Computation.

[11] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[12] Alessandro Sperduti,et al. Learning Orthographic Structure With Sequential Generative Neural Networks , 2016, Cogn. Sci..

[13] Brenda Rapp,et al. The effects of alphabet and expertise on letter perception. , 2016, Journal of experimental psychology. Human perception and performance.

[14] David B. Boles,et al. An upper- and lowercase alphabetic similarity matrix, with derived generation similarity values , 1989 .

[15] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[16] Manuel Perea,et al. Do serifs provide an advantage in the recognition of written words? , 2011 .

[17] Blair C. Armstrong,et al. The what, when, where, and how of visual word recognition , 2014, Trends in Cognitive Sciences.

[18] C Bundesen,et al. A template-matching pandemonium recognizes unconstrained handwritten characters with high accuracy , 1996, Memory & cognition.

[19] Shane T. Mueller,et al. identi fi cation : Effects of perceivability , similarity , and bias ☆ , 2011 .

[20] Jonathan Grainger,et al. Inverse discrimination time as a perceptual distance for alphabetic characters , 2004 .

[21] Bror Zachrisson,et al. Studies in the legibility of printed text , 1965 .

[22] Alberto Testolin,et al. Probabilistic Models and Generative Neural Networks: Towards an Unified Framework for Modeling Normal and Impaired Neurocognitive Functions , 2016, Front. Comput. Neurosci..

[23] D. Bub,et al. Features for Identification of Uppercase and Lowercase Letters , 2008, Psychological science.

[24] S. Dehaene. Reading in the Brain: The New Science of How We Read , 2009 .

[25] Charles A. Perfetti,et al. Visual complexity in orthographic learning: Modeling learning across writing system variations , 2016 .

[26] Jonathan Grainger,et al. Testing computational models of letter perception with item-level event-related potentials , 2009, Cognitive neuropsychology.

[27] Marco Zorzi,et al. Emergence of a 'visual number sense' in hierarchical generative models , 2012, Nature Neuroscience.

[28] Ian C Simpson,et al. A letter visual-similarity matrix for Latin-based alphabets , 2013, Behavior research methods.

[29] Aapo Hyvärinen,et al. Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[30] Garrison W. Cottrell,et al. Looking around the backyard helps to recognize faces and digits , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31] J. Townsend. Theoretical analysis of an alphabetic confusion matrix , 1971 .

[32] Steven M. Seitz,et al. Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[33] Denis G. Pelli,et al. The visual filter mediating letter identification , 1994, Nature.

[34] D. Hubel,et al. Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[35] S. Dehaene,et al. The unique role of the visual word form area in reading , 2011, Trends in Cognitive Sciences.

[36] Rajesh P. N. Rao,et al. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[37] D. Pelli,et al. Feature detection and letter identification , 2006, Vision Research.

[38] Mark S. Seidenberg,et al. Phonology, reading acquisition, and dyslexia: insights from connectionist models. , 1999, Psychological review.

[39] Qiong Zhang,et al. The Structures of Letters and Symbols throughout Human History Are Selected to Match Those Found in Objects in Natural Scenes , 2006, The American Naturalist.

[40] Max Coltheart,et al. Letter recognition: From perception to representation , 2009, Cognitive neuropsychology.

[41] K O Johnson,et al. A comparison of visual and two modes of tactual letter resolution , 1983, Perception & psychophysics.

[42] D. J. Felleman,et al. Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[43] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[44] A. V. D. Heijden,et al. Anempirical interletter confusionmatrix for continuous-line capitals , 1984, Perception & psychophysics.

[45] Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[46] G C Gilmore,et al. Multidimensional letter similarity derived from recognition errors , 1979, Perception & psychophysics.

[47] William White,et al. A Proposal , 2008, Moon, Sun, and Witches.

[48] Orrin Devinsky,et al. Sequential then Interactive Processing of Letters and Words in the Left Fusiform Gyrus , 2012, Nature Communications.

[49] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[50] Michele De Filippo De Grazia,et al. Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists , 2013, Front. Psychol..

[51] G. Sperling,et al. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination , 1991, Vision Research.

[52] B. Strom,et al. In clarification. , 2007, Pharmacoepidemiology and drug safety.

[53] J M Loomis,et al. Analysis of tactile and visual confusion matrices , 1982, Perception & psychophysics.

[54] Shane T. Mueller,et al. Alphabetic letter identification: effects of perceivability, similarity, and bias. , 2012, Acta psychologica.

[55] Régine Kolinsky,et al. Illiterate to literate: behavioural and cerebral changes induced by reading acquisition , 2015, Nature Reviews Neuroscience.

[56] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[57] Terrence J. Sejnowski,et al. The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[58] Marco Zorzi,et al. Modelling reading development through phonological decoding and self-teaching: implications for dyslexia , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[59] Marco Zorzi,et al. Deep generative learning of location-invariant visual word recognition , 2013, Front. Psychol..

[60] Liang She,et al. Spatial structure of neuronal receptive field in awake monkey secondary visual cortex (V2) , 2016, Proceedings of the National Academy of Sciences.

[61] J. Ziegler,et al. Extra-large letter spacing improves reading in dyslexia , 2012, Proceedings of the National Academy of Sciences.

[62] Bruno A. Olshausen,et al. Highly overcomplete sparse coding , 2013, Electronic Imaging.

[63] Catherine E. Snow,et al. Preventing reading difficulties in young children , 1998 .

[64] J. Grainger,et al. Letter perception: from pixels to pandemonium , 2008, Trends in Cognitive Sciences.

[65] Geoffrey E. Hinton. Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[66] Gordon E. Legge,et al. Psychophysics of Reading in Normal and Low Vision , 2006 .

[67] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[68] W. R. Garner,et al. Reaction time as a measure of inter- and intraobject visual similarity: Letters of the alphabet , 1979 .

[69] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[70] D. Pelli,et al. The role of spatial frequency channels in letter identification , 2002, Vision Research.

[71] Michael L. Anderson. Neural reuse: A fundamental organizational principle of the brain , 2010, Behavioral and Brain Sciences.

[72] Eero P. Simoncelli,et al. Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[73] James L. McClelland,et al. An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[74] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[75] James J. DiCarlo,et al. How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[76] Stanislas Dehaene,et al. Adaptation of the human visual system to the statistics of letters and line configurations , 2015, NeuroImage.

[77] Denis G. Pelli,et al. The remarkable inefficiency of word recognition , 2003, Nature.