A Neural Network Model of Lexical Competition during Infant Spoken Word Recognition

Visual world studies show that upon hearing a word in a target-absent visual context containing related and unrelated items, toddlers and adults briefly direct their gaze towards phonologically related items, before shifting towards semantically and visually related ones. We present a neural network model that processes dynamic unfolding phonological representations and maps them to static internal semantic and visual representations. The model, trained on representations derived from real corpora, simulates this early phonological over semantic/visual preference. Our results support the hypothesis that incremental unfolding of a spoken word is in itself sufficient to account for the transient preference for phonological competitors over both unrelated and semantically and visually related ones. Phonological representations mapped dynamically in a bottom-up fashion to semantic-visual representations capture the early phonological preference effects reported in a visual world task. The semantic-visual preference observed later in such a trial does not require top-down feedback from a semantic or visual system.

[1]  K Plunkett,et al.  Infant vocabulary development assessed with a British communicative development inventory , 2000, Journal of Child Language.

[2]  Chris Sinha,et al.  Symbol Grounding or the Emergence of Symbols? Vocabulary Growth in Children and a Connectionist Net , 1992 .

[3]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[4]  R. Aslin,et al.  Lexical competition in young children’s word learning , 2007, Cognitive Psychology.

[5]  Alastair Charles Smith,et al.  The multimodal nature of spoken word processing in the visual world: testing the predictions of alternative models of multimodal integration. , 2016 .

[6]  Bob McMurray,et al.  The slow developmental time course of real-time spoken word recognition. , 2015, Developmental psychology.

[7]  John P. Pinto,et al.  Continuous processing in word recognition at 24 months , 1999, Cognition.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  Kim Plunkett,et al.  Phonological priming and cohort effects in toddlers , 2011, Cognition.

[12]  Falk Huettig,et al.  The tug of war between phonological, semantic and shape information in language-mediated visual search , 2007 .

[13]  M. Carreiras,et al.  Language modality shapes the dynamics of word and sign recognition , 2019, Cognition.

[14]  Odette Scharenborg,et al.  The Effects of Background Noise on Native and Non-native Spoken-word Recognition: A Computational Modelling Approach , 2018, CogSci.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  K. Plunkett,et al.  Spoken-word recognition in 2-year-olds: The tug of war between phonological and semantic activation , 2017 .

[17]  Paul D. Allopenna,et al.  Tracking the Time Course of Spoken Word Recognition Using Eye Movements: Evidence for Continuous Mapping Models , 1998 .

[18]  Bob McMurray,et al.  Defusing the Childhood Vocabulary Explosion , 2007, Science.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.