Multimodal Word Meaning Induction From Minimal Exposure to Natural Text.

By the time they reach early adulthood, English speakers are familiar with the meaning of thousands of words. In the last decades, computational simulations known as distributional semantic models (DSMs) have demonstrated that it is possible to induce word meaning representations solely from word co-occurrence statistics extracted from a large amount of text. However, while these models learn in batch mode from large corpora, human word learning proceeds incrementally after minimal exposure to new words. In this study, we run a set of experiments investigating whether minimal distributional evidence from very short passages suffices to trigger successful word learning in subjects, testing their linguistic and visual intuitions about the concepts associated with new words. After confirming that subjects are indeed very efficient distributional learners even from small amounts of evidence, we test a DSM on the same multimodal task, finding that it behaves in a remarkable human-like way. We conclude that DSMs provide a convincing computational account of word learning even at the early stages in which a word is first encountered, and the way they build meaning representations can offer new insights into human language acquisition.

[1]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[2]  L. Gleitman,et al.  Propose but verify: Fast mapping meets cross-situational word learning , 2013, Cognitive Psychology.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Michael Ramscar,et al.  Testing the Distributioanl Hypothesis: The influence of Context on Judgements of Semantic Similarity , 2001 .

[6]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[7]  Jon Driver,et al.  Prism adaptation does not change the rightward spatial preference bias found with ambiguous stimuli in unilateral neglect , 2011, Cortex.

[8]  Ingo Plag,et al.  Words in the mind , 2012 .

[9]  J. Elman,et al.  Once is Enough: N400 Indexes Semantic Integration of Novel Word Meanings from a Single Exposure in Context , 2012, Language learning and development : the official journal of the Society for Language Development.

[10]  Yair Neuman,et al.  Literal and Metaphorical Sense Identification through Concrete and Abstract Context , 2011, EMNLP.

[11]  Haley A. Vlach,et al.  Developmental differences in children's context-dependent word learning. , 2011, Journal of experimental child psychology.

[12]  Christopher Burr,et al.  Building machines that learn and think about morality , 2018 .

[13]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[14]  Marc Brysbaert,et al.  Wuggy: A multilingual pseudoword generator , 2010, Behavior research methods.

[15]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[16]  Steve R. Howell,et al.  A Model of Grounded Language Acquisition: Sensorimotor Features Improve Lexical and Grammatical Learning. , 2005 .

[17]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[18]  William E. Nagy,et al.  Learning Word Meanings From Context During Normal Reading , 1987 .

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Marco Baroni,et al.  Multimodal Semantic Learning from Child-Directed Input , 2016, NAACL.

[21]  Chen Yu,et al.  The unrealized promise of infant statistical word–referent learning , 2014, Trends in Cognitive Sciences.

[22]  A. Rodríguez-Fornells,et al.  Watching the brain during meaning acquisition. , 2007, Cerebral cortex.

[23]  J. Aitchison Words in the mind , 1994 .

[24]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[25]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[26]  Victor Kuperman,et al.  Using Amazon Mechanical Turk for linguistic research , 2010 .

[27]  R. Sternberg,et al.  Comprehending verbal comprehension. , 1983 .

[28]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[29]  Katrin Erk,et al.  A Flexible, Corpus-Driven Model of Regular and Inverse Selectional Preferences , 2010, CL.

[30]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[31]  H. Gleitman,et al.  Human simulations of vocabulary learning , 1999, Cognition.

[32]  Angeliki Lazaridou,et al.  Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.

[33]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[34]  Frank Keller,et al.  The Plausibility of Semantic Properties Generated by a Distributional Model: Evidence from a Visual World Experiment , 2012, CogSci.

[35]  J. Elman,et al.  Learning to use words: Event-related potentials index single-shot contextual word learning , 2010, Cognition.

[36]  Dušica Filipović Đurđević,et al.  An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. , 2011, Psychological review.

[37]  Heinz Werner,et al.  Development of Word Meaning Through Verbal Context: An Experimental Study , 1950 .

[38]  Hinrich Schütze,et al.  Ambiguity resolution in language learning , 1997 .

[39]  Tien Dat Nguyen,et al.  Do Distributed Semantic Models Dream of Electric Sheep? Visualizing Word Representations through Image Synthesis , 2015, VL@EMNLP.

[40]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[41]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[42]  Lori Markson,et al.  Capacities underlying word learning , 1998, Trends in Cognitive Sciences.

[43]  Brent Kievit-Kylar,et al.  The Semantic Pictionary Project , 2011, CogSci.

[44]  Alessandro Lenci,et al.  ISA meets Lara: An incremental word space model for cognitively plausible simulations of semantic learning , 2007, ACL 2007.