When words referring to actions or visual scenes are presented to humans, distributed networks including areas of the motor and visual systems of the cortex become active [3]. The brain correlates of words and their referent actions and objects appear to be strongly coupled neuron ensembles in dened cortical areas. Being one of the most promising theoretical frameworks for modeling and understanding the brain, the theory of cell assemblies [1, 2] suggests that entities of the outside world (and also internal states) are coded in overlapping neuron assemblies rather than in single ("grandmother") cells, and that such cell assemblies are generated by Hebbian coincidence or correlation learning. One of our long-term goals is to build a multimodal internal representation using several cortical areas or neuronal maps, which will serve as a basis for the emergence of action semantics, and to compare simulations of these areas to physiological activation of real cortical areas. In this work we have developed a cell assembly-based model of several visual, language, planning, and motor areas to enable a robot to understand and react to simple spoken commands. The essential idea is that dieren t cortical areas represent dieren t aspects (and correspondingly dieren t notions of similarity) of the same entity (e.g., visual, auditory language, semantical, syntactical, grasping related aspects of an apple) and that the (mostly bidirectional) long-range cortico-cortical projections represent hetero-associative memories that translate between these aspects or representations. This system is used in a robotics context to enable a robot to respond to spoken commands like "bot show plum" or "bot put apple to yellow cup". The scenario for this is a robot close to one or two tables carrying certain kinds of fruit and/or other simple objects. We can demonstrate part of this scenario where the task is to nd certain fruits in a complex visual scene according to spoken or typed commands. This involves parsing and understanding of simple sentences, relating the nouns to concrete objects sensed by the camera, and coordinating motor output with planning and sensory processing.
[1]
G. Palm,et al.
Integrating object recognition , visual attention , language and action processing on a robot in a neurobiologically plausible associative architecture
,
2004
.
[2]
G. Palm,et al.
Associating words to visually recognized objects ∗
,
2004
.
[3]
Andreas Knoblauch,et al.
Pattern separation and synchronization in spiking associative memories and visual areas
,
2001,
Neural Networks.
[4]
F. Attneave,et al.
The Organization of Behavior: A Neuropsychological Theory
,
1949
.
[5]
F. Pulvermüller,et al.
Words in the brain's language
,
1999,
Behavioral and Brain Sciences.
[6]
D. O. Hebb,et al.
The organization of behavior
,
1988
.