Learning to perform actions through multimodal interaction

Anthropomorphic robots are expected to collaborate with users as human-like assistants. For instance, users can control the behaviors of robots by speech. When a user sends a spoken command “pick up the apple on the desk”, a robot can perform the corresponding action and bring the apple to the user. To achieve this goal, the robot needs to acquire two basic sensorimotor skills. First, it needs to map action verbs (linguistic labels in speech) with corresponding actions. Second, it needs to know how to perform those actions. How does the robot acquire those skills? A new approach is proposed by (Weng et al. 2001) in which a brainlike artificial embodied system develops and learns based on real-time interactions with the environments by using multiple sensors and effectors. This work presents the first steps toward this kind of autonomous learning. Specifically, the embodied system described in this paper is able to acquire basic skills through natural interactions with users.