Connectionist Modeling of Situated Language Processing: Language and Meaning Acquisition from an Embodiment Perspective

Connectionist Modeling of Situated Language Processing: Language and Meaning Acquisition from an Embodiment Perspective Helmut Weldle (helmut.weldle@misc.uni-freiburg.de), Lars Konieczny (lars@cognition.uni-freiburg.de), Daniel Muller (daniel@cognition.uni-freiburg.de), Sascha Wolfer (sascha@cognition.uni-freiburg.de), Peter Baumann (peter.baumann@cognition.uni-freiburg.de) Center for Cognitive Science, University of Freiburg, Friedrichstr. 50 D-79098 Freiburg i. Br., Germany analogical nature this could be a guideline for subsymbolic accounts for grounding language comprehension. Abstract Recent connectionist models and theories of embodied cognition offer new perspectives on language comprehension. We review the latest accounts on the issue and present an SRN- based model, which incorporates ideas of embodiment theories and avoids (1) vast architectural complexity, (2) explicit structured semantic input, and (3) separated training regimens for processing components. Keywords: language acquisition, comprehension, production; sentence processing; language-vision integration; visual attention; embodied cognition; connectionist modeling; SRNs. Connectionist models of language comprehension Several connectionist architectures deal with the task of language comprehension and integration of language and events, proposing different realizations of semantic representation and implementations of the integration process. Rohde (2002) introduced the Connectionist Sentence Comprehension and Production Model, an architecture based on extended simple recurrent networks (SRNs, Elman, 1990) which is capable of comprehending and producing complex sentences, covering a wide range of well-known empirical phenomena. The model clearly focuses on scalability, however possibly at the expense of explanatory power and psychological plausibility. Especially relevant for our issue is the realization of the semantic component: Rohde's model is trained with explicit propositional representations prior to the corresponding target sentence. This greatly assists the network, leaving no way to tell whether it simply succeeds because the semantic representation provided all crucial information explicitly. Learning of the propositions is achieved through a query mechanism, inquiring each of its parts, a process questionable in its cognitive adequacy. The problem of explicit information holds similarly for the Incremental Nonmonotonic Self-organization of Meaning Network (Mayberry, 2003). The semantic representations used in the model are based on Minimal Recursion Semantics (Copestake et al., 2005), which makes them very powerful and complex information carriers. The model is capable of parsing natural language corpora, an impressive achievement, reached at the expense of a highly complex, opaque architecture and pre-fabricated semantic content. In a more recent study Mayberry, Crocker and Knoeferle (2005) introduced the Coordinated Interplay Account Network which integrates a scene representation with the incremental input of a sentence description, enabling adaptive use of context information. Since the presented scenes are externally segmented into agent, action and patient, the major part of semantic interpretation is provided to the model explicitly. While these models certainly achieved good results concerning their aims, they show several shortcomings making them unsuitable for our approach. Firstly, extensive use of different layers and components makes it impossible to deduce responsible structures and working mechanisms Introduction 'Gavagai!' If we heard a native speaking a foreign language utter this word upon seeing a rabbit, we would be faced with the problem Quine described in Ontological Relativity (1968): How do we know what exactly an utterance refers to in an infinitely rich set of objects, events and relations our environment provides? But this problem appears almost trivial compared to a human child confronted with the task to acquire its mother's language. Several sub-tasks have to be solved simultaneously to achieve this grounding of speech to referential meaning: there is the problem of a highly complex world rich in details, happenings and relations. There is the problem of a continuous stream of words. There is the large problem of relating the one to the other. And there is the problem that there is no previously given language to help finding this relation. In other words, the task is to bind a holistic situation to a sequential series of related linguistic expressions. This affords to integrate representations of language and the outside world, both represented in distinctive forms, following completely different rules and depending on different hierarchical and causal relations. A central aspect of models of language comprehension and acquisition is how they account for these questions. In connectionist models, language interpretation and integration of situational context is based on mechanisms of association and self-organization. Theories in embodied cognition research offer an account for assignment of linguistic structures to constructions of coherent semantic interpretations. Language comprehension is considered to be a simulation of perceptual experiences of the hearer, and the linguistic structure serves as an instruction for the correct construction of the situation. Due to its

[1]  Marshall R. Mayberry,et al.  A Connectionist Model of Sentence Comprehension in Visual Worlds , 2005 .

[2]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[3]  E. Rosch,et al.  Cognition and Categorization , 1980 .

[4]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[5]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[6]  David C. Plaut,et al.  A connectionist model of sentence comprehension and production , 2002 .

[7]  I. Rooij,et al.  Connectionist semantic systematicity , 2009, Cognition.

[8]  Peter Ford Dominey Learning Grammatical Constructions in a Miniature Language from Narrated Video Events , 2003 .

[9]  L. Barsalou,et al.  Whither structured representation? , 1999, Behavioral and Brain Sciences.

[10]  Risto Miikkulainen,et al.  Incremental nonmonotonic parsing through semantic self-organization , 2003 .

[11]  Rolf A. Zwaan The Immersed Experiencer: Toward An Embodied Theory Of Language Comprehension , 2003 .

[12]  James L. McClelland,et al.  Distributed memory and the representation of general and specific information. , 1985, Journal of experimental psychology. General.

[13]  Mark Steedman,et al.  Connectionist sentence processing in perspective , 1999, Cogn. Sci..

[14]  Mathieu Koppen,et al.  Modeling multiple levels of text representation , 2007 .

[15]  T. Rohde LENS : The light , efficient network simulator , 1999 .

[16]  Kenny R. Coventry,et al.  Spatial Prepositions and Vague Quantifiers: Implementing the Functional Geometric Framework , 2004, Spatial Cognition.

[17]  Charles A. Perfetti,et al.  Higher level language processes in the brain : inference and comprehension processes , 2007 .

[18]  Lynn V. Richards,et al.  On the foundations of perceptial symbol systems: Specifying embodied representations via connectionism , 2003 .