Letter perception emerges from unsupervised deep learning and recycling of natural image features

The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem1,2. Here, we present a large-scale computational model of letter recognition based on deep neural networks3,4, which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input5,6. In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition7, earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments8.Testolin et al. develop a computational model of letter perception based on deep learning and show that domain-general visual knowledge extracted from natural scenes is recycled for learning domain-specific cultural artefacts, such as printed letters.

[1]  Jonathan Grainger,et al.  References and Notes , 2022 .

[2]  Marco Zorzi,et al.  Do current connectionist learning models account for reading development in different languages? , 2004, Cognition.

[3]  E. Candès,et al.  Ridgelets: a key to higher-dimensional intermittency? , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[4]  Jonathan Grainger,et al.  A Vision of Reading , 2016, Trends in Cognitive Sciences.

[5]  M. Sigman,et al.  Opinion TRENDS in Cognitive Sciences Vol.9 No.7 July 2005 The neural code for written words: a proposal , 2022 .

[6]  Alberto Testolin,et al.  Modeling language and cognition with deep unsupervised learning: a tutorial overview , 2013, Front. Psychol..

[7]  D. Pelli,et al.  Measuring contrast sensitivity , 2013, Vision Research.

[8]  J. Muise,et al.  Alphabetic confusion: A clarification , 1985, Perception & psychophysics.

[9]  S. Dehaene,et al.  Cultural Recycling of Cortical Maps , 2007, Neuron.

[10]  Thad A. Polk,et al.  A Simple Common Contexts Explanation for the Development of Abstract Letter Identities , 1997, Neural Computation.

[11]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[12]  Alessandro Sperduti,et al.  Learning Orthographic Structure With Sequential Generative Neural Networks , 2016, Cogn. Sci..

[13]  Brenda Rapp,et al.  The effects of alphabet and expertise on letter perception. , 2016, Journal of experimental psychology. Human perception and performance.

[14]  David B. Boles,et al.  An upper- and lowercase alphabetic similarity matrix, with derived generation similarity values , 1989 .

[15]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[16]  Manuel Perea,et al.  Do serifs provide an advantage in the recognition of written words? , 2011 .

[17]  Blair C. Armstrong,et al.  The what, when, where, and how of visual word recognition , 2014, Trends in Cognitive Sciences.

[18]  C Bundesen,et al.  A template-matching pandemonium recognizes unconstrained handwritten characters with high accuracy , 1996, Memory & cognition.

[19]  Shane T. Mueller,et al.  identi fi cation : Effects of perceivability , similarity , and bias ☆ , 2011 .

[20]  Jonathan Grainger,et al.  Inverse discrimination time as a perceptual distance for alphabetic characters , 2004 .

[21]  Bror Zachrisson,et al.  Studies in the legibility of printed text , 1965 .

[22]  Alberto Testolin,et al.  Probabilistic Models and Generative Neural Networks: Towards an Unified Framework for Modeling Normal and Impaired Neurocognitive Functions , 2016, Front. Comput. Neurosci..

[23]  D. Bub,et al.  Features for Identification of Uppercase and Lowercase Letters , 2008, Psychological science.

[24]  S. Dehaene Reading in the Brain: The New Science of How We Read , 2009 .

[25]  Charles A. Perfetti,et al.  Visual complexity in orthographic learning: Modeling learning across writing system variations , 2016 .

[26]  Jonathan Grainger,et al.  Testing computational models of letter perception with item-level event-related potentials , 2009, Cognitive neuropsychology.

[27]  Marco Zorzi,et al.  Emergence of a 'visual number sense' in hierarchical generative models , 2012, Nature Neuroscience.

[28]  Ian C Simpson,et al.  A letter visual-similarity matrix for Latin-based alphabets , 2013, Behavior research methods.

[29]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[30]  Garrison W. Cottrell,et al.  Looking around the backyard helps to recognize faces and digits , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  J. Townsend Theoretical analysis of an alphabetic confusion matrix , 1971 .

[32]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[33]  Denis G. Pelli,et al.  The visual filter mediating letter identification , 1994, Nature.

[34]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[35]  S. Dehaene,et al.  The unique role of the visual word form area in reading , 2011, Trends in Cognitive Sciences.

[36]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[37]  D. Pelli,et al.  Feature detection and letter identification , 2006, Vision Research.

[38]  Mark S. Seidenberg,et al.  Phonology, reading acquisition, and dyslexia: insights from connectionist models. , 1999, Psychological review.

[39]  Qiong Zhang,et al.  The Structures of Letters and Symbols throughout Human History Are Selected to Match Those Found in Objects in Natural Scenes , 2006, The American Naturalist.

[40]  Max Coltheart,et al.  Letter recognition: From perception to representation , 2009, Cognitive neuropsychology.

[41]  K O Johnson,et al.  A comparison of visual and two modes of tactual letter resolution , 1983, Perception & psychophysics.

[42]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[43]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[44]  A. V. D. Heijden,et al.  Anempirical interletter confusionmatrix for continuous-line capitals , 1984, Perception & psychophysics.

[45]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[46]  G C Gilmore,et al.  Multidimensional letter similarity derived from recognition errors , 1979, Perception & psychophysics.

[47]  William White,et al.  A Proposal , 2008, Moon, Sun, and Witches.

[48]  Orrin Devinsky,et al.  Sequential then Interactive Processing of Letters and Words in the Left Fusiform Gyrus , 2012, Nature Communications.

[49]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[50]  Michele De Filippo De Grazia,et al.  Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists , 2013, Front. Psychol..

[51]  G. Sperling,et al.  Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination , 1991, Vision Research.

[52]  B. Strom,et al.  In clarification. , 2007, Pharmacoepidemiology and drug safety.

[53]  J M Loomis,et al.  Analysis of tactile and visual confusion matrices , 1982, Perception & psychophysics.

[54]  Shane T. Mueller,et al.  Alphabetic letter identification: effects of perceivability, similarity, and bias. , 2012, Acta psychologica.

[55]  Régine Kolinsky,et al.  Illiterate to literate: behavioural and cerebral changes induced by reading acquisition , 2015, Nature Reviews Neuroscience.

[56]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[57]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[58]  Marco Zorzi,et al.  Modelling reading development through phonological decoding and self-teaching: implications for dyslexia , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[59]  Marco Zorzi,et al.  Deep generative learning of location-invariant visual word recognition , 2013, Front. Psychol..

[60]  Liang She,et al.  Spatial structure of neuronal receptive field in awake monkey secondary visual cortex (V2) , 2016, Proceedings of the National Academy of Sciences.

[61]  J. Ziegler,et al.  Extra-large letter spacing improves reading in dyslexia , 2012, Proceedings of the National Academy of Sciences.

[62]  Bruno A. Olshausen,et al.  Highly overcomplete sparse coding , 2013, Electronic Imaging.

[63]  Catherine E. Snow,et al.  Preventing reading difficulties in young children , 1998 .

[64]  J. Grainger,et al.  Letter perception: from pixels to pandemonium , 2008, Trends in Cognitive Sciences.

[65]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[66]  Gordon E. Legge,et al.  Psychophysics of Reading in Normal and Low Vision , 2006 .

[67]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[68]  W. R. Garner,et al.  Reaction time as a measure of inter- and intraobject visual similarity: Letters of the alphabet , 1979 .

[69]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[70]  D. Pelli,et al.  The role of spatial frequency channels in letter identification , 2002, Vision Research.

[71]  Michael L. Anderson Neural reuse: A fundamental organizational principle of the brain , 2010, Behavioral and Brain Sciences.

[72]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[73]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[74]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[75]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[76]  Stanislas Dehaene,et al.  Adaptation of the human visual system to the statistics of letters and line configurations , 2015, NeuroImage.

[77]  Denis G. Pelli,et al.  The remarkable inefficiency of word recognition , 2003, Nature.