论文信息 - Understanding Grounded Language Learning Agents

Understanding Grounded Language Learning Agents

Neural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and even execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome certain well-studied learning challenges that are also fundamental to infants learning their first words. While it is notable that models with no meaningful prior knowledge overcome these learning obstacles, AI researchers and practitioners currently lack a clear understanding of exactly how they do so. Here we address this question as a way of achieving a clearer general understanding of grounded language learning, both to inform future research and to improve confidence in model predictions. For maximum control and generality, we focus on a simple neural network-based language learning agent trained via policy-gradient methods to interpret synthetic linguistic instructions in a simulated 3D world. We apply experimental paradigms from developmental psychology to this agent, exploring the conditions under which established human biases and learning effects emerge. We further propose a novel way to visualise and analyse semantic representation in grounded language learning agents that yields a plausible computational account of the observed effects.

[1] R. Brown,et al. A First Language , 1973 .

[2] J. Dore,et al. Transitional phenomena in early language acquisition , 1976, Journal of Child Language.

[3] R. Pea. The development of negation in early child language , 1978 .

[4] E. Markman,et al. Developmental differences in the acquisition of basic and superordinate categories. , 1980 .

[5] Dedre Gentner,et al. Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. Technical Report No. 257. , 1982 .

[6] Linda B. Smith,et al. The importance of shape in early lexical learning , 1988 .

[7] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[8] Ellen M. Markman,et al. Constraints Children Place on Word Meanings , 1990, Cogn. Sci..

[9] Chris Sinha,et al. Symbol Grounding or the Emergence of Symbols? Vocabulary Growth in Children and a Connectionist Net , 1992 .

[10] M. Gluck,et al. Hippocampal mediation of stimulus representation: A computational theory , 1993, Hippocampus.

[11] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[12] A. Paivio,et al. Concreteness effects on memory: when and why? , 1994 .

[13] Terry Regier,et al. The Human Semantic Potential: Spatial Language and Constrained Connectionism , 1996 .

[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[15] R. Shore. Rethinking the Brain: New Insights into Early Development , 1997 .

[16] Gary F. Marcus,et al. Poverty of the stimulus arguments , 1999 .

[17] Rebecca J. Brand,et al. Breaking the language barrier: an emergentist coalition model for the origins of word learning. , 2000, Monographs of the Society for Research in Child Development.

[18] Seungjin Choi,et al. Shaping meanings for language: universal and language-specific in the acquisition of spatial semanti , 2001 .

[19] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[20] J. Bertoncini,et al. Before and after the vocabulary spurt: two modes of word acquisition? , 2003 .

[21] Michael B. Lewis,et al. Age of acquisition and the cumulative-frequency hypothesis: a review of the literature and a new multi-task investigation. , 2004, Acta psychologica.

[22] James L. McClelland,et al. Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[23] Philipp Slusallek,et al. Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[24] Linda B. Smith,et al. From the lexicon to expectations about kinds: a role for associative learning. , 2005, Psychological review.

[25] J. Tenenbaum,et al. Word learning as Bayesian inference. , 2007, Psychological review.

[26] T. Rogers,et al. Where do you know what you know? The representation of semantic knowledge in the human brain , 2007, Nature Reviews Neuroscience.

[27] Linda B. Smith,et al. Infants rapidly learn word-referent mappings via cross-situational statistics , 2008, Cognition.

[28] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[29] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[30] Yann LeCun,et al. Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[31] V. Marchman,et al. Blue car, red car: Developing efficiency in online interpretation of adjective–noun phrases , 2010, Cognitive Psychology.

[32] Rutvik H. Desai,et al. The neurobiology of semantic memory , 2011, Trends in Cognitive Sciences.

[33] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[34] R. N. Spreng,et al. The Future of Memory: Remembering, Imagining, and the Brain , 2012, Neuron.

[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36] Michael C. Frank,et al. Social and Discourse Contributions to the Determination of Reference in Cross-Situational Word Learning , 2013 .

[37] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[38] F. Pulvermüller,et al. Motor cognition–motor semantics: Action perception theory of cognition and communication , 2014, Neuropsychologia.

[39] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[40] Michael C. Frank,et al. The role of context in young children's comprehension of negation , 2014 .

[41] Hod Lipson,et al. Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[42] Jiasen Lu,et al. VQA: Visual Question Answering , 2015, ICCV.